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Preface to the Third Edition 


The first edition of this book came out more than 20 years ago, and the second more 
than 10. A lot has gone on during that 20-year time span, both in the book’s subject 
matter and in our own professional lives. When we wrote the first edition, we were 
junior academics, and the research methods literature was much smaller and easier to 
master than it is now. We learned an enormous amount in the course of writing that 
first edition text; as has frequently been observed (originally by the physicist Frank 
Oppenheimer, according to Wikipedia), the best way to learn something is to teach it. 
As our careers have progressed, so has the methodological literature, which seems to 
have outgrown our own capacity (and probably anyone else’s) to keep up with it. 
Such is its volume and complexity that it has seemed as big a task to produce this third 
edition from the second as it did producing the first from scratch. However, we have 
once again relished getting to grips with the new ideas ourselves and attempting to 
communicate them clearly to our readers. 

Since the previous edition, there have been major changes in how information is 
accessed and processed, and in how research is conceptualized and conducted. Some 
of the most important additions or changes in this edition are systematic review 
methods and literature-searching methods (see Chapter 3), structured guidelines for 
appraising the research literature (see Chapters 3 and 8) and for preparing journal 
articles (see Chapter 8), modern psychometric methods (e.g., item response theory, 
see Chapter 4), guidance on choosing between different qualitative approaches (see 
Chapter 5), and the internet as a medium for conducting psychological research (see 
Chapters 6 and 10). 

When we began updating the second edition to produce this one, we initially 
thought that we would completely revamp the references, as several had endured 
since the first edition and were written before many of our readers would have been 
born. We had a general “out with the old, in with the new,” “let’s clear out the attic” 
attitude. However, as the writing progressed, it quickly became apparent that many of 
the old references actually hold up rather well, several being classic papers that all 
clinical psychologists need to be aware of. So, while we have updated many of the 
citations, the end result represents what we hope is a judicious mix of ancient and 
modern. 
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The choice of title led to some debate among the authors and publishers. The first 
edition, which was entitled Research Methods in Clinical and Counseling Psychology , 
had its genesis in our teaching on clinical and counseling psychology courses. The 
second edition, entitled Research Methods in Clinical Psychology , focused on clinical 
psychologists as a primary readership, with counseling, health, educational, and 
community psychologists also being very much in our minds. The book should really 
be called something like Research Methods in Clinical Psychology and Allied 
Professions, but that is too clunky and unfocused. In our time, we have taught research 
methods to students and professionals in many other allied fields, including health, 
community, counseling, and educational psychology, psychiatry, speech therapy, and 
nursing. We want this text to be accessible to all of these audiences and more. We 
hope that potential readers from other disciplines will judge the book by the content 
not just the title—we intend it to be useful for not just clinical psychologists, but also 
for a broad range of mental health disciplines. 

We have once again tried to make the text reader-friendly by having frequent 
bullet-point summaries of the important points in boxes, and a chapter summary and 
suggested reading at the end of each chapter. In this edition, we have added ques¬ 
tions for self-reflection, also at the end of each chapter. Personal preferences are an 
often unacknowledged influence on the research that one conducts, and the ques¬ 
tions for reflection are designed to help readers explore what they think and feel 
about the various approaches and issues that we have described in each chapter. We 
have also, as with the last edition, uploaded supplementary material for readers and 
instructors onto the book’s website. 

A few matters of grammar and style are worth noting. We have generally preferred 
vernacular to supposedly purist forms of expression. Thus, following recent trends, we 
have usually used the colloquial “they” to indicate a single person of unspecified 
gender, rather than the awkward sounding “he or she.” “Data” is treated as a collective 
noun either in the singular or the plural, as sense dictates, as in common speech. We 
are fully aware that it is a plural noun in Latin, but like “agenda,” also a Latin plural, 
it is frequently used in the singular in spoken English. We have also not hesitated to 
boldly split infinitives: the supposed rule prohibiting this practice now seems 
antiquated. 

As with previous editions, we have tried to make this one relevant both to North 
American and to British readers. We are a transatlantic authorship team (one Brit, one 
American, and one who is both), although we are all currently working in the United 
Kingdom. Due to limitations in our abilities and experience, we have restricted most 
of our examples to the English-speaking world. However, we have taught research 
methods in other countries, and have had some instructive correspondence with our 
Asian, African, and Australian readers, so we hope that the book can be useful to 
readers outside of North America and the British Isles. 

The first two authors are fortunate to work at University College London (UCL) 
in London’s Bloomsbury district, which is probably the best place on the planet for 
library access. For this book, we have relied on three excellent libraries - the UCL 
library, the University of London Research Library, and the British Library - which 
are all within easy walking distance. UCL has provided us with an outstanding selec¬ 
tion of electronic journals, the University of London Research Library has a superb 
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reference collection of psychology books for browsing, and the British Library is a 
magnificent public resource capable of supplying our every bibliographic want. Long 
may these institutions flourish! 

Revising this book has also brought home once more what an excellent research 
methods education we three all received in our graduate school days at the University 
of California, Los Angeles. We were exposed to the full gamut of methodological 
options, by first-rate statistics and measurement instructors in the Psychology 
Department and innovative qualitative researchers in Sociology. This book is a tribute 
to all of our own instructors and mentors. 

We are grateful to our many academic friends and colleagues—both past and 
present—in our own universities and our wider scientific circles, for inspiring us, 
keeping us up to date, and challenging us. We would also like to thank the following 
for their help with preparing the current edition. Several colleagues gave us sugges¬ 
tions or generously commented on chapter drafts: John Cape, Kate Cheney, James 
Coyne, Ravi Das, Allen Dyer, Peter Fonagy, Andy Fugard, Vyv Huddy, Zoe Huntley, 
Narinder Kapur, John King, Henry Potts, Tony Roth, James Schuurmans-Stekhove, 
and Francine Wood. Special thanks to Will Mandy for looking at several chapters at 
short notice. Marie Brown capably assisted with the library research, efficiently chasing 
up some of the more obscure references, and road-tested several parts of the text. 
Rachel Schon kindly assisted with the indexing. Shamil Wanigaratne and Sue Salas 
have been encouraging and supportive readers over three editions (and three coun¬ 
tries). Our thanks to the team at Wileys: Andrew McAleer, who first encouraged us to 
undertake this rewrite, Karen Shield, our project editor, Amy Minshull, the editorial 
assistant, Nivedha Gopathy, the project manager, and Stephen Curtis, our eagle-eyed 
copy-editor. Thanks also to those who helped with previous editions: John Cape, 
Lorna Champion, Linda Clare, Michael Coombs, Neil Devlin, Jerry Goodman, Les 
Greenberg, Dick Hallam, Connie Hammen, Wendy Hudlass, Maria Koutantji, David 
Rennie, Laura Rice, Joe Schwartz, Pam Smith, and Mark Williams with the first 
edition, and Anna Barker, Chris Brewin, John Cape, Kate Cheney, Pasco Fearon, Dick 
Hallam, David Shapiro, Jonathan Smith, Lesley Valerio, and Vivian Ward with the 
second. And, finally, many thanks to all of our students, past and present, for their 
engagement with our teaching and supervision, and for continuing to keep us on 
our toes. 

Even though we have benefited enormously from the advice and scrutiny of our 
colleagues and students, the responsibility for any residual errors remains our own. 
The process of preparing this edition has unearthed some minor mistakes in the 
previous one, and doubtless others still lurk herein. If you spot something wrong, 
please let us know, and we will post a correction on the book’s website. We appreciate 
any feedback, positive, negative, or neutral, from our readers. We hope that this book 
will prove a useful resource in your own consumption or production of research, or in 
simply appreciating what a complex business it all is. 



About the Companion Website 


The companion website for the book, at www.wiley.com/go/barker provides 
supplementary material for readers, both students and instructors. For each chapter 
there are PowerPoint slides, questions for reflection, internet resources, and more. 
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Introduction: The Research 
Process 


KEY POINTS IN THIS CHAPTER 

• Research tells a story. 

• Research raises questions as well as answering them. 

• There is a vigorous debate within psychology about what constitutes legitimate 
research. 

• This text takes a stance of methodological pluralism: of fitting the research 
method to the research question. 

• The research process can be divided into four main stages: groundwork, 
measurement, design, and analysis/interpretation. 


Research tells a story. Ideally, it resembles a detective story, which begins with a mystery 
and ends with its resolution. Researchers have a problem that they want to investigate; 
the story will reach its happy ending if they find a solution to that problem. 

In practice, however, things aren’t quite that simple, and the actual picture is closer 
to an adventure story, with many unexpected twists and turns. Often, the resolution of 
a research project is uncertain: it doesn’t answer your initial research question, rather 
it tells you that you were asking the wrong question in the first place, or that the way 
that you went about answering it was misconceived. You struggle with discouragement 
and frustration; perhaps you come out of it feeling lucky to have survived the thing 
with your health and relationships (mostly) intact. So, if you enjoy research and are 
determined to make a contribution, you organize a sequel, in which you try out a 
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better question with a better designed study, and so it goes on. Another way of putting 
it is that there are stories within stories, or a continuing series of stories. Each individual 
research project tells one story, the series of projects conducted by a researcher or a 
research team forms a larger story, and the development of the whole research area a 
yet larger story. And this progression continues up to the level of the history of science 
and ideas over the centuries. 

Another way that things are not so simple is that not all researchers agree on what 
constitutes a legitimate story. The situation in psychology is analogous to developments 
in literature. On the one hand is the traditional research story, rather like a Victorian 
novel, which has a clear beginning, middle, and end, and is expected to provide a more 
or less faithful reflection of reality. On the other hand, in this modern and postmodern 
age, we encounter narratives that do not follow an orderly chronological sequence or tie 
up neady at the end. Furthermore, they may not claim to represent, or may even reject 
the idea of, reality. 

These developments in literature and psychology reflect general intellectual devel¬ 
opments during the last century, which have ramifications across many branches of 
European and English-speaking culture, both artistic and scientific. Our own field of 
interest, psychology in general and clinical psychology in particular, has been going 
through a vigorous debate about the nature of research - that is, which of these narratives 
we can call research and which are something else. Scholars from various corners of the 
discipline of psychology (e.g., Carlson, 1972; Driver-Linn, 2003; Gergen, 2001; Rogers, 
1985; Sarbin, 1986) have questioned the validity and usefulness of psychology’s version of 
the traditional story, which has been called “received-view” or “old-paradigm” research: 
essentially a quantitative, hypothetico-deductive approach, which relies on linear causal 
models. These and other critics call for the traditional approach to be replaced, or at least 
supplemented, by a more qualitative, discovery-oriented, nonlinear approach to research. 

This debate, as Kimble (1984) pointed out, is a contemporary manifestation of 
William James’s (1907) distinction between tough-minded and tender-minded 
ways of thinking, which is itself a translation into psychological terms of the old 
debate in philosophy over empiricism (Aristotle) versus rationalism (Plato). 
However, it is simplistic to view this debate as two-sided, with researchers being 
either in one camp or the other. It is better viewed as reflecting multiple underlying 
attitudes, for example, preferences for quantitative versus qualitative methods, 
attitudes towards exploratory versus confirmatory research questions, experimental 
control versus real-world relevance, and so on (Kimble, 1984). 

One consequence of the lack of consensus about acceptable approaches to research is 
that people who are doing research for the first time may experience considerable 
anxiety - rather like the existential anxiety that accompanies a loss of meaning (Yalom, 
1980). Undertaking a research project without being clear about what standards are to 
be used to evaluate it is an unsettiing experience. Furthermore, there is a political 
dimension, since people in powerful positions in the academic world - journal editors, 
grant reviewers, and university professors - often adhere to the more traditional models. 

This anxiety is exacerbated because the rules are not always made explicit, which may 
make beginning researchers feel, like Alice in Wonderland, that they are in a strange 
country with mysterious and arbitrary rules that are continually being changed. 
Researchers are constandy reminded, in various ways, to behave themselves properly in 
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accordance with these scientific rules; as the Red Queen said to Alice, “Look up, speak 
nicely and don’t twiddle your fingers all the time!” This experience can be understandably 
off-putting for people trying to enter the research wonderland for the first time. 

We will reconsider these issues in Chapters 2,4, and 5, which address the conceptual 
underpinnings of research. However, it is worth stating at the outset that our own 
stance is one of methodological pluralism. We don’t think that any single approach to 
research (or, indeed, that psychological research itself) has all the answers; thus, we 
believe that researchers need to have at their disposal a range of methods, appropriate 
to the problems being investigated. We have considerable sympathy with the critics of 
the received view, but are not convinced that the consequence of accepting their 
criticisms is to abandon traditional quantitative methods, or even research in general. 
Indeed, we feel that to do so would be a disaster tor psychology and for society. 
Fortunately, we see increasing signs that it is possible to articulate a synthesis of the 
old- and new-paradigm traditions, that there are general principles common to 
rigorous research within whatever paradigm, and that it is possible to lay out an overall 
framework which organizes different approaches to research and clarifies the ways in 
which they can complement one another. Learning to do psychological research is 
partly a process of learning disciplined enquiry according to these principles within 
this general framework. 

At the same time, there are rules of good practice specific to each type of research. We 
will base our methodological pluralism on a principle of appropriate methodologies (by 
analogy to the catch phrase “appropriate technology” in the economics of development). 
By this, we mean that the methods used should flow out of the research questions asked. 
Different questions lend themselves to different methods. To resume our literary analogy, 
like the different literary genres (mystery, romance, science fiction, autobiography, etc.), 
we can think of different research genres, such as survey research, randomized clinical 
trials, systematic case studies, and in-depth qualitative interview studies. Each of these 
research genres has different stories to tell and different rules of good practice. 

We will attempt to clarify these general principles and specific rules of good practice, 
so that you will be in a better position to appreciate other people’s research. We hope 
that this will help you feel less intimidated about the prospect of conducting your own 
research. Also, there is value in making the rules of research explicit, so that one can 
challenge them more effectively, and thus contribute to the debate about how 
psychological research should be conducted. 

Research is demanding: it does require clear and rigorous thought, as well as 
perseverance and stamina, but it is also fascinating and exciting, and, we hope, beneficial 
to the public that psychologists ultimately profess to serve. 


The Research Process 

This book is structured around a simple chronological framework, which we call the 
research process: that is, the sequence of steps that researchers go through during a 
project. The steps can be grouped into four major stages. Like all such frameworks, 
it is idealized, in that the stages are not always distinct and may interact with each 
other. However, we find it a useful way of thinking about how research is conducted, 
both one’s own and other people’s. 
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1. Groundwork (Chapter 3). This stage involves both scientific issues - choosing the 
topic, reviewing the literature, specifying the conceptual model, formulating the 
research questions - and also practical issues - resolving organizational, political, 
financial, or ethical problems. Sometimes researchers give the groundwork short 
shrift, being anxious to get on with the business of running the project itself 
However, we will argue that devoting careful thought at this stage repays itself 
with interest during the course of the project. 

2. Measurement (Chapters 4 to 7). Having formulated the research questions, the 
next step is to decide how to measure the psychological constructs of interest. 
We are here using the term “measurement” in its broadest sense, to encompass 
qualitative as well as quantitative approaches to data collection. 

3. Design (Chapters 8 to 11). Research design issues concern when and from whom 
the data will be collected. For example: Who will the participants be? Will there be 
an experimental design with a control group? How many pre- and post-assessments 
will there be? What ethical concerns need to be addressed? These design issues can 
usually be considered independendy of measurement issues. 

The research questions, measurement procedures, and design together consti¬ 
tute the research protocol, the blueprint for the study. Having gone through these 
first three stages, researchers will usually conduct a small pilot study, whose results 
may cause them to rethink the protocol and possibly to conduct further pilots. 
Eventually the protocol is finalized; the last stage then consists of implementing it. 

4. Analysis, interpretation, and dissemination (Chapter 12). The data are collected, 
analyzed, interpreted, written up, possibly published, and, let us hope, acted upon. 

These stages in the research process constitute our framework for the book. However, 
we will also examine some key philosophical, professional, and political issues that are 
central to thinking about the whole research enterprise (Chapters 2, 4, and 5). 
Although following these arguments is not necessary for learning purely technical 
research skills, it is important to understand the wider context in which research is 
being conducted, as doing so will lead to more focused, coherent, and ultimately 
useful research programs. It is also important to keep in mind that doing research is 
much more than the exercise of a set of techniques; carrying out research involves 
imagination and empathy, problem-solving skills and critical thinking, and ethical 
reflection and social responsibility. 

The first part of this background material is given in the next chapter, which analyzes 
the meaning of some of the terms we have so far left undefined, such as “research” 
itself. We will also discuss why anyone might want to engage in research at all. 
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KEY POINTS IN THIS CHAPTER 

• Psychological research is situated within philosophical, professional, 
personal, and political contexts. 

• The process of psychological research is similar to that of open-minded 
enquiry in everyday life. 

• Several philosophers have attempted to characterize the essence of scientific 
progress: Popper, Kuhn, and Feyerabend are central figures. 

• Social and political forces shape the development of science. 

• The scientist-practitioner model is a central part of clinical psychology’s 
professional ideology, but there is often a gap between rhetoric and reality. 

• Practicing clinical psychologists may choose to do research, or not to, for a 
variety of reasons. 


This chapter examines some important background issues, in order to give a sense of the 
context in which research is conducted. These cover the “three P’s”: the philosophical 
framework (i.e., the underlying set of assumptions about the research process), the 
professional context (i.e., how research fits in to clinical psychology’s professional iden¬ 
tity), and also the personal context (i.e., each individual researcher’s own attitudes 
towards research). In the background there is also the fourth P, the political context. 

Understanding these contextual issues is helpful both in reading other people’s 
research and also in conducting your own. It helps make sense of other people’s research 
if you understand the framework within which it was conducted. If you are doing 
research yourself, it follows that the more you are aware of your assumptions, the more 
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you are able to make informed choices about what methods to use, rather than follow¬ 
ing available examples blindly (Elliott, 2008). This is similar to clinical work, where 
clients who have greater insight into their motivating forces are generally better able to 
live freer and more productive lives, and therapists who are able to step outside of their 
own perspective are better able to understand and help their clients (Rogers, 1975). 
However, again as in clinical work, making decisions can be hard work as you become 
aware of the multiple possibilities of action instead of making automatic choices. 

The chapter has three sections, covering philosophical, professional, and personal 
issues. Political issues are touched on in all three sections. 


PHILOSOPHICAL ISSUES 

This section examines what is meant by two key terms: research and science. However, 
we need to start out with a couple of disclaimers. First, several of the ideas are com¬ 
plex and require philosophical expertise to appraise them properly. We do not possess 
such expertise, nor do we expect the great majority of our readers to. Second, 
grappling with difficult issues such as the nature of reality at this early stage can be 
heavy going. As is the case in all philosophy, there are more questions than answers. 
We attempt to give an overview of some interesting contemporary issues; it is not 
necessary to follow them in detail in order to conduct or critique research. However, 
having a broad grasp of them will help you understand (perhaps more clearly than the 
researchers themselves do) what a piece of research is attempting to achieve. 

Philosophical issues that relate more specifically to psychological measurement 
(namely discussion of the positivist, phenomenological, and social constructionist 
positions) are covered in Chapters 4 and 5. 


What is Research? 


• Conducting research is essentially a circular activity (see Figure 2.1). 

• Research requires psychological flexibility and open-mindedness. 

• Research is not the only way to acquire psychological understanding: litera¬ 
ture, life experience, and supervised clinical work are also important. 

• The main reason for following rigorous research methods is to minimize 
bias and reduce errors in drawing conclusions. 

• A rudimentary understanding of epistemology (the theory of knowledge) 
helps to elucidate some basic procedures and distinct stances towards 
research (e.g., critical realism and constructionism). 


As Figure 2.1 suggests, the research process is a potentially everlasting circle. Our 
human propensity to understand ourselves and the world that we live in has been noted 
since ancient times. Plato had Socrates say (in the Apology, 38) that “the unexamined 
life is not worth living.” Some writers, for instance, Cook and Campbell (1979), 
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Figure 2.1 The research cycle 


consider that the psychological roots of research have evolutionary significance: that 
there is survival value in our attempts to understand the world and ourselves. 

Note that this circular model does not attempt to explain where we get our ideas 
from in the first place. There is a long-standing debate in philosophy and develop¬ 
mental psychology, which we will sidestep for the moment, about whether acquiring 
knowledge of the world is possible without some previous understanding. Our 
emphasis is on how educated adults discover and test ideas. 

Research demands a degree of psychological flexibility, that is, an ability to modify 
one’s ideas if they are not supported by the evidence. It may be helpful to view various 
sorts of disruptions in the circular model as corresponding to various maladaptive 
psychological styles. For instance, a refusal to interact with the world at all, elaborating 
theories without ever testing them against the “real world” (i.e., never moving down 
off the first stage of our circular model), is a solipsistic stance of building dream castles 
with no basis in reality - a stance captured in the epithet used to describe out-of-touch 
academics: “the ivory tower.” This refusal to gather information also characterizes 
someone who is overconfident in the value of their ideas, and does not see any need 
to put them to any kind of empirical test. (Politicians often seem to fall into this cate¬ 
gory, with the result that many aspects of our society, such as education, the penal 
system, and health care, are largely determined by ideology rather than evidence.) 

Problems in the lowest quadrant of the circle include biases in analyzing or inter¬ 
preting the data: allowing what you want to get from a research project to distort how 
you report what actually happened. Our data are always influenced to some extent by 
our values and preconceptions; after all, these determine what we choose to study in 
the first place, what we count as data, what we select as important to report from 
amongst our findings, and inevitably the conclusions we draw about the world from 
our research. Indeed, Bayes’s theorem holds that drawing inferences from research to 
the world is impossible without taking prior assumptions into account (Dienes, 2011). 
In extreme cases, however, researchers’ personal circumstances or ideological com¬ 
mitments may lead them to ignore or suppress unwanted findings, or even to fabricate 
results (Pashler & Wagenmakers, 2012). While extreme cases of scientific dishonesty 
are probably rare, each of us is subject to self-deception, which may lead to distorting 
our results in subde ways, the most common of which is simply dismissing our own 
or other people’s results that don’t fit our preconceptions. 
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Similar problems exist in the final step of the circular model: the refusal to modify 
one’s ideas, because one dismisses or distorts the evidence, which characterizes a rigid, 
dogmatic stance. This can be seen in people who cling to various kinds of orthodoxies 
and fundamentalist beliefs in the face of contrary evidence. (Politicians often seem to 
fall into this category too!) 

While passions and personal feuds make science more interesting, and have always 
helped drive it forward, we believe that curiosity and an inquiring, open-minded 
research attitude is one aspect of good psychological functioning. It is similar to 
Jahoda’s (1958) concept of “adequate perception of reality” as one criterion for 
positive mental health. 

Thus far, our characterization of research applies to everyday life as much as to 
organized science. We all do research informally; it is one way that we form our 
mental representations of the world. This is what Reason and Rowan (1981) call 
“naive enquiry.” George Kelly (1955) elaborated the metaphor of the person as a 
scientist into an entire theory of personality: that people are continually building and 
testing their set of “personal constructs.” However, cognitive and social scientists 
have also shown that people display pervasive biases in the way that they process 
information (Fislce & Taylor, 2013; Kahneman, 2011; Nisbett & Ross, 1980). The 
fundamental reason for the development of rigorous research methods is to attempt 
to minimize biases in drawing conclusions from evidence. 

Finally, we should make it clear at the outset that we do not see research as being 
the only, or even an especially privileged, route to knowledge. One can learn much of 
relevance to psychology from the works of Shakespeare, Tolstoy, George Eliot, or 
James Joyce (to name a few of our own favorites). Great works of art or literature will 
often have a ring of truth that will immediately resonate with the viewer or reader. 
Furthermore, everyday life experiences also help build a knowledge base. In Morrow- 
Bradley and Elliott’s (1986) survey of sources of psychotherapeutic knowledge, ther¬ 
apists reported that they learned most from experience with their clients, followed by 
theoretical or practical writings, being a client themselves, supervision, and practical 
workshops. Research presentations and research reports were ranked first by only 10% 
of the sample of practicing therapists (in contrast to experience with clients, which was 
ranked first by 48%). 

However, the strength of formal research is that it is a systematic way of looking at 
the world and of describing its regularities, and it provides knowledge that can allow 
us to decide between conflicting claims to truth that may be put forward by rival pro¬ 
ponents. New approaches to treatment are constantly being developed, and usually the 
person who develops the therapy will offer some preliminary evidence for its effective¬ 
ness. One example of a therapy that has gained widespread attention is multisystemic 
therapy (MST) for adolescent conduct disorders (Henggeler, Melton, & Smith, 1992). 
However, it has also attracted controversy about the quality of its supporting evidence 
(Littell, 2006), which has mostly been produced by the model’s proponents. Until 
several rigorous studies have been conducted by researchers without a theoretical 
allegiance to the model, we will not be able to properly evaluate its effectiveness and 
mechanisms of action. 

Furthermore, because research is a shared, public activity, it has a crucial role in 
contributing to the development of theory and professional knowledge. Interactions 
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with clients, conversations with fellow professionals, and personal growth experiences 
are all useful ways of educating oneself individually, but research, theoretical writings, 
and published case reports are public documents and therefore contribute to the 
development of the profession as a whole. 

We will explore such professional issues more fully in the next section, and then, in 
the final section, discuss why individual psychologists might (or might not) want to 
do research. However, before we can do this, we need to examine the meaning of 
some of our core terminology in greater depth. 

Definition of “Research” 

The Oxford English Dictionary’s definition of “research” serves as a good working 
definition. It is: “A search or investigation directed to the discovery of some fact by 
careful consideration or study of a subject; a course of critical or scientific enquiry.” 
Five aspects of this definition are noteworthy. 

First, the definition stresses the methodical aspect of research, that research is careful 
and disciplined. It is a craft that requires considerable dedication and attention to 
detail. There is also, however, a chance element to research: not all discoveries are 
necessarily planned and serendipity often enters in (Merbaum & Fowe, 1982). The 
classic example of an accidental scientific discovery is Fleming’s isolation of penicillin, 
when he noticed that some mold in a dish stopped the growth of bacteria he was 
attempting to cultivate. However, to take advantage of a chance discovery, the 
researcher must have the knowledge and insight to appreciate its significance, and then 
the persistence to follow it up. As Fouis Pasteur, the microbiologist who invented the 
rabies vaccination is reputed to have said, “In the fields of observation, chance favors 
only the mind that is prepared” (O’Brien & Bartlett, 2012). 

Second, the definition specifies a critical or detached attitude. This attitude is an 
important feature of the clinical psychology discipline. Clinical psychologists are 
trained to question the basis of professional practice, for example, “What’s going on 
here?”; “How do you know that?”; “What’s the evidence for that assertion?” This 
skeptical attitude does not always endear them to their colleagues from other mental 
health disciplines: it can at times lapse into rigid adherence to a narrow form of 
scientific practice (e.g., large randomized clinical trials), and may contribute to the 
common perception of psychologists as standing at one step removed from the other 
professionals in a team or service. 

Third, the definition does not specify the method of research, suggesting the value 
of both rational and empirical investigation. While rational or conceptual research is 
sometimes denigrated in psychology as “speculation” or “armchair philosophizing,” 
it is essential in other disciplines, especially the humanities, and is the method of choice 
in mathematics (the “queen of the sciences”) and theoretical physics, both of which 
proceed from axioms to deductions. Psychology is primarily an empirical science, 
concerned with systematically gathering data, which are then used, in ways we will 
discuss below, to develop and test its theories. However, there is also an important role 
for conceptual research, to formulate theories, to explicate underlying principles, and 
to identify the assumptions underlying research (Slife & Williams, 1995). This issue of 
research method relates back to the centuries-old philosophical debate between ratio¬ 
nalists and empiricists over the sources of human knowledge (Russell, 1961). 
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Fourth, the definition states that research is a process of discovery. This raises the 
distinction between exploratory research, which sets out to find something new, and 
confirmatory research, which sets out to evaluate existing theory (see Chapter 3). 
Philosophers of science make a similar distinction between the context of discovery 
and the context of justification of a particular finding (Reichenbach, 1938). We 
include both exploratory and confirmatory approaches under the definition of 
research, and see both as equally valid and useful. 

Finally, the definition says that research is directed towards the discovery of facts. 
The Oxford English Dictionary defines a fact as “something that has really occurred 
or is the case.” However, this definition begs some difficult philosophical questions 
about how we come to know what is true, and requires some consideration of the 
philosophical basis of truth and knowledge. 

Epistemology 

The theory of knowledge is known as epistemology; it is the area of philosophy devoted 
to describing how we come to know things or believe them to be true or real. In fact, 
when psychologists talk about validity and reliability, in either quantitative psycho¬ 
metrics (see Chapter 4) or qualitative research (see Chapter 5), they are talking in 
epistemological terms. According to Hamlyn (1970; see also Packer & Addison, 
1989), there are four fundamental epistemological positions, or criteria of truth: 

1. The correspondence theory of truth, the basis of realist philosophies, holds that a 
belief is true if it matches reality. 

2. Coherence theory, the basis of rationalist philosophies, holds that a belief is true if 
it is internally consistent or logically non-contradictory. 

3. The pragmatist or utilitarian criterion holds that a belief is true if it is useful or 
produces practical benefits. 

4. The consensus criterion, the basis of sociological theories of knowledge (see 
below), holds that a belief is true if it is shared by a group of people. 

None of these theories is completely adequate: all have serious logical flaws. For 
example, correspondence theory involves an infinite regress, because reality must be 
measured validly before the degree of correspondence can be assessed. (This is referred 
to as the criterion problem in measurement.) Furthermore, counterinstances of each 
of the other three criteria can readily be imagined (e.g., an elegant, coherent theory 
which has no bearing on reality; a false belief which nevertheless proves useful; and a 
false consensus or collective delusion). On the other hand, all four theories have some 
value, as practical, but fallible guidelines (Anderson, Hughes, & Sharrock, 1986), 
suggesting the importance of a pluralist epistemology. Optimally, one would attempt 
to realize all four truth criteria in one’s research (cf. Elliott, Fischer, & Rennie, 1999). 

Realism and Constructionism 

Physical scientists often implicitly work from a realist position, which is based on a 
correspondence theory of truth. Realism posits that there is a real world out there, 
independent of whoever may be observing it (Bhaslcar, 1975). Thus the rocks of the 
moon have a geological composition that is, at least in principle, discoverable: that 
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some people may believe the moon to be made of green cheese is irrelevant. Within 
this realist framework, the task of the scientist is to understand as accurately as pos¬ 
sible the properties of the real world. Scientists themselves might say that they are 
trying to understand Nature. 

For most of the past 100 years, psychologists have also emphasized a correspondence 
theory of truth, although in the latter half of the 20th century this evolved into a criti¬ 
cal realist position (Cook & Campbell, 1979). This assumes that there exists a real 
world out there that has regularities. However, we can never know it with certainty: 
all our understandings are essentially tentative. The critical realist position emphasizes 
the replicability of research: that other researchers should be able to repeat your work 
and get approximately the same results, or in more technical language, that knowledge 
should be “intersubjectively testable” (Cook & Campbell, 1979; Popper, 1959). This 
means that researchers must be explicit about how they collected their data and drew 
their conclusions, so that other researchers can evaluate their conclusions or replicate 
the study themselves. Beyond this, it suggests that researchers should approach the 
same topic using different methods, with complementary strengths and weaknesses, a 
strategy of “triangulation” (Creswell, 2009; Tashakkori & Teddlie, 2009), a term 
taken from geometry and surveying. Thus, critical realists go beyond correspondence 
theory to include consensus and coherence truth criteria. 

In the last two decades of the 20th century, various challenges to realist and critical 
realist philosophies emerged. These approaches emphasize either coherence or con¬ 
sensus theories of truth and try to eliminate correspondence criteria. The major 
current alternative to the critical realist position can be found in the various forms of 
constructionism and constructivism , some of which overlap considerably with postmod¬ 
ernism (Gergen, 2001; Guba & Lincoln, 1989; Neimeyer, 1993) and with narrative 
approaches (Bruner, 1991; Riessman, 2008). These are fairly imprecise terms, but 
they share a common stance of dispensing with the assumption of an objective reality 
and instead studying people’s interpretations or stories (see Chapter 5 for further 
discussion). Postmodernists are impatient with what they call “grand theory”; instead 
they present a more multifaceted, fractured world view, some taking the extreme 
point of view that there are no true and false stories, only different stories. The central 
problem with such radical constructionist or postmodernist views is that not all con¬ 
structions or stories are equally interesting, consistent, replicable, shared, useful, or 
even accurate. That smoking causes lung cancer or that poverty reduces one’s quality 
of life, though not unassailable propositions, seem to describe important consistencies 
in the world. 

Social constructionists emphasize the social construction of reality and see the 
research setting as a specialized form of social interaction, a situation for eliciting and 
studying people’s stories. They argue that researchers are not detached observers, but 
actively play a part in what they are studying and how they make sense of it (McGrath & 
Johnson, 2003). Thus, the collection, analysis, and interpretation of data involve 
processes of active construction. A related point is the interdependence of the knower 
and the known, which is emphasized by constructivists, like Piaget (1970), Vygotsky 
(1978), and Bruner (1987). That is to say, in coming to know a thing, both the state 
of our knowledge and the thing itself may be changed; what we call facts are a joint 
construction of the things themselves and our knowing process. For example, the 
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process of interviewing a client about her reactions to a recent therapy session may 
change both the way that the interviewer understands the process of therapy, and the 
way that the client feels about the session, her therapist, or herself 

Pure and Applied Research 

There are many ways to classify research, for example, according to content, setting, 
population, or method. One important distinction is between basic academic research 
and applied (including evaluation) research. Although often presented as a dichotomy, 
the two positions are better thought of as two ends of a continuum (Milne, 1987; 
Patton, 2002). 

Basic (or pure) research addresses the generation and testing of theory. What are the 
underlying processes that help us understand the regularities in nature? Basic research 
emphasizes processes common to most people. Because clinical psychology is an 
applied discipline, basic research is rare, but examples of research toward the basic end 
of the spectrum include the relative contributions of relationship versus technique 
factors in therapy outcome in general, and the neuropsychological mechanisms 
involved in recalling traumatic memories. 

Applied research addresses practical questions, for example, whether a particular 
intervention works for a particular client group. At the far applied end of the spec¬ 
trum is action research (Patton, 2002), carried out to address a particular local 
problem, such as the high dropout rate at a local psychotherapyservice. Evaluation 
research also resides near the applied end of the spectrum, as it primarily addresses the 
general needs or outcomes of a particular agency or service, but may have a broader 
relevance. Evaluation is often motivated by pragmatic concerns, such as the need to 
maintain funding for a particular service. Although the methods used in pure and 
applied research overlap considerably, we will address some issues particular to evalu¬ 
ation research in Chapter 11. 

In actual practice, pure and applied research blend into each other. As the above 
examples of pure research demonstrate, there is often an element of application in 
clinical research: that is what makes it clinical. Many examples of clinical research lie 
on the middle ground. For instance, psychotherapy outcome research addresses 
questions of both theory and application. Since we see the pure/applied distinction 
as a continuum rather than a dichotomy, we adhere to a definition of research that 
encompasses the full spectrum, and can even be extended to clinical practice (a point 
we take up later in this chapter). 


What is Science? 

We have used the word “science” up to now without questioning its meaning. 
Yet there is a lively debate about what science consists of, a debate that goes to the 
heart of some enduring controversies within psychology and related fields. It addresses 
the question of how knowledge is acquired and which methods of research are 
“scientific” (and therefore respectable). In a much-used example, how can we distin¬ 
guish between legitimate science and voodoo or astrology? Or is such a distinction 
only a social construction? Closer to home, in what sense is psychoanalysis a science? 
Or, indeed, psychology in general? 
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Key points: 

• There is a lively debate within psychology about which methods are scientific 
and which are not. 

• Philosophers of science have attempted to define the unique characteristics 
of science. 

• Induction is the process of deriving theories from careful observations. The 
central problem with induction is the theory-dependence of observation. 

• Deduction is the process of making testable predictions from theories. It is 
the basis of the hypothetico-deductive model of science. 

• Popper proposed that good scientific theories should be testable and there¬ 
fore potentially falsifiable. 

• Kuhn analyzed the historical progression of scientific thought in terms of his 
concepts of paradigms and scientific revolutions. 

• The sociology of knowledge examines the role of social and political forces 
in the development of scientific thought. 


The literature on this area is enormous: philosophy of science is an entire academic 
discipline in itself. Here we briefly review some central ideas. Since much undergrad¬ 
uate psychology education is implicitly based on a traditional view of science, it is 
important for psychologists to know about the positions presented here and in 
Chapters 4 and 5, in order to understand the context of the traditional view and to be 
aware of its alternatives. 

Induction 

An initial, common-sense way of attempting to characterize science is that it is based 
on careful observation, from which theories are then formulated. The derivation of 
theory from observation is known as induction , that is, going from the particular to 
the general. Astronomy is the classic example: astronomers gaze at the heavens, record 
what they see, and then try to spot the general pattern underlying their observations. 
Kepler’s 17th-century laws of planetary motion were derived in such a way, using the 
accumulated data of his predecessor, Tycho Brahe. Within psychology, clinical obser¬ 
vation also uses induction. For example, the psychoanalyst carefully observes a number 
of patients within the analytic setting, and then attempts to formulate his or her 
impressions into a theory. This was the basis of Freud’s methods when he enunciated 
psychoanalytic theory at the beginning of the 20th century. 

Unfortunately, there are two insuperable problems with induction as a guiding 
principle of science (Chalmers, 2013). The first is that it is impossible to have pure 
observations: what we observe and how we observe it are, implicidy or explicitly, 
based on theory. This phenomenon is known as the theory-dependence of observation. 
For example, a psychoanalyst, a Skinnerian behaviorist, and a lay person will notice 
very different things in a videotape of a therapy session. The second problem is that 
there is no logical basis for the principle of induction. Because something has been 
observed to happen on ten occasions, it does not necessarily follow that it will happen 
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on the eleventh. This means that theories can never be conclusively verified, only 
temporarily corroborated by scientific evidence, resulting in probabilistic rather than 
necessary truths. The philosopher, Karl Popper, who was a contemporary of Freud 
and Adler in 1920s Vienna, expressed this point of view forcefully. It is worth giving 
an extended quotation, which is of enduring relevance to psychologists: 

I found that those of my friends who were admirers of Marx, Freud, and Adler, were 
impressed by a number of points common to these theories, and especially by their 
apparent explanatory power. These theories appeared to be able to explain practically 
everything that happened within the fields to which they referred ... 

The most characteristic element in this situation seemed to me the incessant stream 
of confirmations, of observations which ‘verified’ the theories in question; and this 
point was constantly emphasized by their adherents. ... The Freudian analysts 
emphasized that their theories were constantly verified by their “clinical observa¬ 
tions.” As for Adler, I was much impressed by a personal experience. Once, in 1919, 

I reported to him a case which to me did not seem particularly Adlerian, but which 
he found no difficulty in analyzing in terms of his theory of inferiority feelings, 
although he had not even seen the child. Slightly shocked, I asked him how he could 
be so sure. “Because of my thousand fold experience,” he replied; whereupon I could 
not help saying: “Aid with this new case, I suppose, your experience has become 
thousand-and-one fold.” 

What I had in mind was that his previous observations may not have been much 
sounder than this new one; that each in its turn had been interpreted in the light of 
“previous experience,” and at the same time counted as additional confirmation ... 

I could not think of any human behavior which could not be interpreted in terms of 
either theory. It was precisely this fact - that they always fitted, that they were always 
confirmed - which in the eyes of their admirers constituted the strongest argument 
in favor of these theories. It began to dawn on me that this apparent strength was in 
fact their weakness. (Popper, 1963: 34-35, reproduced by permission) 

This quotation illustrates several important issues: (1) the limits of a verificationist 
approach (i.e., the approach taken by Adler of supporting his theory by looking for 
confirming instances) - good theories should be potentially capable of disconfirma- 
tion; (2) the problems of post-hoc explanation (it is easy to fit a theory to facts after 
the event); (3) the theory-dependence of observation (e.g., Adlerians tend to interpret 
everything in terms of inferiority complexes); and, finally, (4) the temptation for 
scientists to jump to conclusions without careful data gathering - Adler might have 
been more convincing if he had actually seen the child in question. 

However, despite these major problems with induction, we are not suggesting that 
it be abandoned altogether, rather that it be conducted within a rigorous framework 
and complemented by other approaches, such as deduction and falsification. We will 
return to this in several subsequent chapters, especially in the section on systematic 
case studies in Chapter 9. 

Deduction and Falsification 

Having rejected the principle of induction as a sole, secure foundation for science, 
Popper attempted to turn the problem on its head: he looked at solutions based on 
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deduction rather than induction, on falsification rather than verification. Deduction is 
the converse of induction: it means going from the theory to a testable prediction, 
known as a hypothesis. This approach to research, which is the traditional scientific 
approach within psychology, is known as the hypothetico-deductive method. 

Popper’s landmark volume, The Logic of Scientific Discovery (1959), set out to 
establish a demarcation between science and non-science (or “pseudo-science”). His 
central criterion was that a science must be able to formulate hypotheses that are 
capable of refutation or, in his preferred terminology, falsification. For example, 
Newtonian physics generates the proposition that a ball thrown up in the air will 
come down to land again. If tennis balls started shooting out into space, the theory 
would have a lot of explaining to do. In a more technical example, Newtonian 
physics also generates the proposition that light travels in a straight line. Although 
this proposition seems almost self-evident, it was ultimately falsified in a spectacular 
way by Eddington’s expedition to observe a solar eclipse in Africa in order to test a 
deduction from Einstein’s theory of relativity that light will bend in the presence of 
a gravitational field. 

In psychology, such unequivocal falsifications of theoretically derived predictions 
are less common. One area where they can be found is in neuropsychological case 
studies of patients with acquired brain damage. The presence of certain patterns of 
dysfunction in a single case can be used to refute general theories of mental structure 
(ShaUice, 1988). 

As an example of a non-falsifiable theory, consider this statement, by the painter 
Mondrian: “The positive and the negative break up oneness, they are the cause of all 
unhappiness. The union of the positive and negative is happiness” (quoted by Wilson, 
1990: 144). This certainly appears to be some sort of psychological theory, but it is 
not clear to what extent it could generate falsifiable propositions, and thus what could 
be done to test its validity. According to Popper, a statement that cannot be falsified 
is unscientific (though it is not necessarily meaningless - religion and poetry may have 
meaning, but they are not falsifiable). 

For Popper, good science is characterized by a series of bold conjectures, which will 
be ultimately falsified. This approach is encapsulated in the tide of one his books, 
Conjectures and Refutations (1963). A good or productive theory is one that pro¬ 
vides the basis for a large number of falsifiable propositions. A bad theory, or an 
unscientific one, is incapable of falsification. However, all theories must be considered 
to be tentative; it is impossible to know the world exactly. Every theory in its time will 
be falsified and replaced by another (as Newtonian mechanics were supplanted by 
Einstein’s theory of relativity). 

The falsifiability criterion places those fields which rely exclusively on post-hoc 
explanatory methods outside the boundaries of science. In particular, Popper (1963) 
used falsifiability to rule out psychoanalysis and Marxism, fashionable theories in 
Popper’s Vienna of the 1920s, which he explicitly pointed to as his main targets. On 
the other hand, operant behavioral approaches, with their philosophy of “prediction 
and control” (Skinner, 1953), would be included as scientific. 

This version of falsificationism has a number of problems. The main one is that no 
theory ever completely accounts for all the known findings. Inconsistencies always 
exist, but the theory may well be retained in spite of them, as they could be explained 
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in other ways than the falsification of the theory, for example, measurement or design 
errors in the studies. Refutation is never cut and dried: there is always scope to deny 
that it has occurred. One historical example is in the debate over the effectiveness of 
psychotherapy. In a landmark paper, Eysenck (1952) claimed that psychotherapy 
showed no benefit above that of spontaneous remission, sparking years of controversy. 
This hypothesis seemed to have been finally laid to rest by Smith and Glass’s (1977) 
pioneering meta-analysis. However, Eysenck responded by dismissing the whole 
meta-analysis procedure, labeling it “an exercise in mega-silliness” (Eysenck, 1978, 
p. 517), continuing to deny that his hypothesis had been refuted. 

Abduction 

One recent development is the revival of interest in Charles Peirce’s (1965) notion of 
abduction (an awkward term, which has nothing to do with its usual sense of kidnap¬ 
ping), picked up by Haig (2005), Rennie (2012), and Stiles (2009). According to 
Peirce, abduction is a logical process that scientists use when faced with a surprising 
finding: they search with their imagination for possible explanations. This corresponds 
to Popper’s (1963) “bold conjecture” formulation, but it is more thoroughly worked 
out logically and is related to the processes of induction (“fact gathering” in Stiles’s 
(2009) formulation) and deduction (checking the theory for logical consistency and 
deriving implications to be tested for). The ideas are complex, but the key point is 
that working scientists use a combination of processes, cycling between careful data 
collection, creative leaps, and logical or statistical inference. 

Paradigms and Scientific Revolutions 

A central problem arising from Popper’s work is to explain how one theory is replaced 
by another. Since there are always unexplained or contradictory observations within a 
scientific field, what determines when one theory is rejected and replaced by another? 
This issue was the point of departure for the work of Thomas Kuhn, one of the central 
figures of 20th-century philosophy of science. In The Structure of Scientific Revolutions 
(1970), he applied the tools of historical analysis to address these questions. 

Kuhn proposed the concept of a paradigm , that is, the central body of ideas within 
which the great majority of scientists is working at any given time. The paradigm 
determines what phenomena scientists consider important and the methods that they 
use to make their observations. Scientists working within a paradigm are said to be 
doing normal science: they are elaborating theories rather than attempting to refute 
them. Eventually, the accumulated deficiencies of a paradigm lead to its overthrow 
and replacement by another paradigm, in what Kuhn labels a scientific revolution. For 
example, the replacement of the ancient Greek cosmology of Aristotle and Ptolemy 
(that the earth was the center of the universe) by Copernican theory (that the earth 
moves around the sun) in the 1500s was a scientific revolution. 

The concept of a paradigm is central to Kuhn’s work. It fits well with the physical 
sciences, but there is much debate about how well it can be applied in the social 
sciences (Driver-Linn, 2003; Lambie, 1991). Is there a guiding paradigm in clinical 
psychology? Or are there multiple paradigms, indicating that we are still in what 
Kuhn referred to as a pre-paradigmatic state? Arguably, cognitive-behavioral, psy¬ 
chodynamic, and humanistic approaches may be considered as concurrent competing 
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paradigms, although this is perhaps overlooking the fundamental overlap between 
these approaches. 

Kuhn’s views and their relationship to those of Popper were hotly debated when 
they first appeared (Lakatos & Musgrave, 1970). Lakatos accused Kuhn of propound¬ 
ing a “mob psychology” (Lakatos, 1970, p. 178) of scientific revolutions, saying that 
his system contained no criteria for considering one paradigm an advance on another, 
and thus no sense in which scientific understanding could be said to be progressing. 

Feyerabend’s (1975) pragmatic-anarchistic view takes this to an extreme. Under a 
slogan of “anything goes” as long as it’s useful for carrying knowledge forward, 
Feyerabend appears to be claiming that different theories are “incommensurable” and 
that there are therefore no clear grounds for preferring one to another. So the anar¬ 
chistic view would accord astrology, voodoo, and Babylonian astronomy the same 
scientific status as quantum mechanics or relativity (Chalmers, 2013). This viewpoint 
is pithily summed up in a rhyming couplet by the late poet and musician Moondog 
(n.d): “What I say of science here, I say without condition/ That science is the latest 
and the greatest superstition” (Louis Hardin, Managarm; reproduced by permission). 

It seems as though the views of Popper and of Kuhn are themselves “incommensu¬ 
rable” in that they are each using different concepts to discuss somewhat different 
phenomena. Popper takes a logical approach, Kuhn a historical one. While trying to 
avoid the danger of falling into a relativist, “anything goes” position ourselves, we 
contend that much of value can be taken from both writers. 

From Popper, researchers can take the central admonition of making bold theories 
that lead to clear and risky predictions, and being ready to give these theories up in 
the face of contradictory evidence. Popper urges researchers to put their thoughts 
into clear and precise language. As an example, Rogers’s (1957) seminal paper on 
the necessary and sufficient conditions of therapeutic personality change is written 
with admirable clarity, and makes bold hypotheses about the central mechanisms of 
therapeutic change. 

Kuhn also encourages taking intellectual risks, though from a different standpoint. 
By clearly delineating the constrictions of “normal science,” he provides an implicit 
critique, helping researchers to be aware of and to question the assumptions of the 
paradigm within which they work, and to ask whether that paradigm is worth chal¬ 
lenging. His work also leads scientists to look ahead to the next paradigm revolution 
and to ask whether their work will have any enduring value. 

Finally, the methodological pluralist stance that informs this book owes something 
to the spirit that animates Feyerabend’s writing. We agree with his stress on the value 
of diversity and the dangers of scientific conformity. We do, however, strongly dis¬ 
agree with his rejection of the canons of scientific method. As we hope to show, it is 
possible to articulate criteria to evaluate work conducted within the very different 
scientific traditions in clinical psychology. 


Social and Political Issues 

As Kuhn (1970) illustrates, science is not conducted in a cultural and political vacuum. 
It is carried out by scientists working within a particular scientific, professional, 
and cultural community at a specific moment in history. Sociologists of knowledge 
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(e.g., Berger & Luckmann, 1966) and social constructionists (e.g., Gergen, 1985) 
look at how social factors influence the development of thought and the role of con¬ 
sensus in science. For example, what is seen as abnormal behavior varies from culture 
to culture, and within cultures over time. 

Sociological and historical methods can be applied to examine science itself, to 
look at how socioeconomic and political forces shape the kind of science that is prac¬ 
ticed within a given culture (Chalmers, 1990; Schwartz, 1992): how one set of ideas 
gains prominence over another. These analyses have often been carried out within a 
Marxist framework, which examines the influence of class interests on scientific 
thought (Albury & Schwartz, 1982). For example, genetic explanations of individual 
differences in IQ scores fit in well with racist and fascist ideologies, and some of the 
impetus behind the development of IQ tests probably arose from such a background 
(Rose, Kamin, & Lewontin, 1984; Rust & Golombok, 2008). 

An example within clinical psychology is the debate about “evidence-based practice” 
or “empirically supported treatments” - the attempt to produce a list of therapies that 
have been systematically researched and found to be beneficial. Proponents of this 
project argue that it is an essential attempt to summarize the state of scientific research 
on the psychological therapies, and it will ultimately benefit clients. Its opponents 
argue that it is driven by the needs of the U.S. managed care industry or the U.K. 
National Health Service to have short-term treatments to save money, and by factions 
within clinical psychology itself that are seeking to advance their own favored orienta¬ 
tions at the expense of other approaches (Elliott, 1998). 

Rigid rules for what is and is not science sometimes serve political purposes 
(e.g., fighting for limited funds from government or universities), and may have 
the unfortunate consequence of restricting healthy diversity in studying complex 
clinical phenomena. On the other hand, psychology in general now seems more 
secure as a discipline, and as a consequence it seems that more psychologists are 
now freer to work within a broader definition of science and to use a wider range 
of methods. 

One other important source of sociopolitical influence on scientific activity stems 
from the fact that research is conducted within an organized professional context. The 
official pronouncements of the clinical psychology profession have stressed the value 
of conducting research and have also sought to prescribe what type of research is 
regarded as legitimate. The various ways that this is expressed are examined next. 


PROFESSIONAL ISSUES 

It now seems almost uniformly accepted that research should be part of clinical psy¬ 
chologists’ training and practice. How did this idea arise? What is the relationship bet¬ 
ween clinical practice and research? Several models of how practitioners might produce 
or consume research have been proposed, among them the scientist-practitioner model 
and the applied-scientist model. It is also worth considering, as a kind of baseline, a 
model of a psychologist who does not use research, which we have labeled the intuitive 
practitioner model. 
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There are several models of how practitioners might produce or consume research: 

• The intuitive practitioner, who conducts clinical work on the basis of 
personal intuition and knowledge from sources other than research. 

• The scientist-practitioner, who is competent as both a researcher and a 
practitioner. 

• The applied scientist, who conducts clinical work as a form of applied research. 

• The local clinical scientist, who applies a range of research methods and 
critical thinking skills to solve local problems in clinical settings. 

• The evidence-based practitioner, who systematically searches the literature 
to obtain the best evidence on which to base clinical decisions. 

• The clinical scientist, who draws on general psychology to produce research 
on clinical problems for the evidence-based practitioner to use. 

• The practice-based evidence model, in which clinicians generate evidence 
about the effectiveness of clinical services using their own routinely col¬ 
lected data. 


The Intuitive Practitioner 

The intuitive practitioner is a term we use to describe the therapist who does not 
conduct research and does not consciously use research findings in their clinical 
work. Intuitive practitioners conduct clinical work mostly on the basis of knowledge 
drawn from personal experience, supervision, or reading clinical case reports; they 
are often highly skilled but are sometimes unable to articulate their implicit 
knowledge. If they do research at all, it is in the form of clinical observation and 
narrative case studies. This model overlaps with the concept of the reflective practi¬ 
tioner (Schon, 1987); however, the main point here is the intuitive practitioner’s 
relative neglect of research evidence, which may arise from a questioning of the 
value of scientific, as opposed to clinical, knowledge (e.g., Miller, 2004). Most 
clinical psychologists have probably encountered examples of skilled colleagues or 
supervisors who fit this description. 

It is hard to estimate what proportion of psychologists comes under this heading. 
It has often been observed that many psychological therapists do not conduct research 
or even consume it (Morrow-Bradley & Elliott, 1986). A much-cited statistic from 
surveys of U.S. clinicians is that the modal number of publications among practicing 
clinical psychologists is zero. However, this statistic (the mode) gives a misleading 
impression of the overall pattern, as the majority of respondents to the most recent 
survey had published at least one paper (Norcross, Karpiak, & Santoro, 2005). 
Furthermore, psychologists are often involved in research that does not reach the 
publication stage. Nevertheless, this statistic is a salutary reminder to academically 
oriented clinical psychologists, who have been at the forefront of articulating the 
more mainstream models of research and practice, that, despite all their earnest pro¬ 
nouncements, research and research findings are generally not very salient in practicing 
psychologists’ minds. 
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The Scientist-Practitioner 

Since its inception, psychology has been a university-based discipline. It originally 
emerged out of philosophy in the 19th century and was later aligned with the natural 
sciences in order to give it increased respectability in the academic world. The profes¬ 
sion of clinical psychology started life in the first decades of the 20th century and was 
initially concerned with “mental testing” as an aid to selection and diagnosis; it was 
only after World War II that its role expanded to include treatment (Humphreys, 
1996). However, during its transition from university to clinic, the profession sought 
to retain its academic roots, in that the distinctive role of the psychologist was seen to 
lie in his or her academic, scientific, or scholarly viewpoint. As we have mentioned, this 
academic viewpoint may lead to tensions with other colleagues in multidisciplinary 
clinical teams. 

This scientific role has received somewhat different emphases in the United States 
and the United Kingdom. In the United States it is known as the scientist-practitioner 
(or Boulder) model, in the United Kingdom the applied scientist model. 

The post-war expansion of U.S. clinical psychology, especially in the Veterans 
Administration Hospitals, led to an upgrading of training from the Masters to the 
Doctoral level, and to an examination of what such training should consist of (Hayes, 
Barlow, & Nelson-Gray, 1999). The consensus view at the time was expressed in a 
conference at Boulder, Colorado, in 1949, and became known, naturally enough, as 
the Boulder model. The field was then in its infancy, its knowledge base was tiny, and 
there was a great need for placing the profession on a firm scientific footing, in order 
to know whether its procedures worked. The conference concluded that clinical psy¬ 
chologists should be able to function both as scientists and practitioners, capable of 
conducting research as well as clinical work. A quotation from an article slightly prior 
to the Boulder conference gives the flavor: 

Participants [in doctoral training programs] should receive training in three 
functions: diagnosis, research, and therapy, with the special contributions of the 
psychologist as research worker emphasized throughout. (American Psychological 
Association, 1947: 549) 

Thus the scientist-practitioner model emphasizes research and practice as separate, 
parallel activities. Clinical psychologists are seen as both productive researchers and 
productive clinicians. 

The main limitation of the scientist-practitioner model is that it is hard to put into 
practice. It demands a high level of skill and motivation in two distinct areas - research 
and practice - and clinicians who are equally comfortable in both of these areas are 
rare. Furthermore, the pressures of many clinical psychologists’ jobs make it hard to 
find the time and resources to do research. 

The Applied Scientist 

In the United Kingdom, the applied scientist model took a slightly different emphasis 
to the American scientist-practitioner model: less on research and clinical work as 
separate activities, more on the systematic integration of scientific method into 
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clinical work. Monte Shapiro, one of the founders of British clinical psychology, set 
out the three aspects of the applied scientist role (Shapiro, 1967, 1985): 

1. Applying the findings of general psychology to the area of mental health. 

2. Only using methods of assessment that have been scientifically validated. 

3. Doing clinical work within the framework of the scientific method, by forming 
hypotheses about the nature and determinants of the client’s problems and col¬ 
lecting data to test these hypotheses. 

Thus, in Shapiro’s applied scientist model, research and practice are not dichotomized 
but integrated. This approach is also manifested in the behavioral tradition of single 
case experimental designs (see Chapter 9, this volume, and Hayes et al., 1999). 

In sum, the applied scientist is principally a clinician; the scientist-practitioner is 
both a clinician and a researcher. 

The limitations of the applied scientist approach are that, like the scientist- practi¬ 
tioner model, it can be hard to put into practice. It works better within some types 
of therapy than others (e.g., it is hard to fit psychodynamically oriented therapies 
into this approach). Also, the intensive collection of session-by-session data can 
be burdensome for both therapist and client, although simplified versions of the 
Personal Questionnaire, one of Shapiro’s main data collection methods, have been 
developed (e.g., Elliott, Wagner, Sales, Rodgers, Alves, & Cafe, 2015). 


The Local Clinical Scientist 

Strieker and Trierweiler (1995; Trierweiler & Strieker, 1998) have put forward the 
local clinical scientist model, a more flexible version of the applied scientist model. 
Their formulation is worth quoting here: 

The local clinical scientist is a critical investigator who uses scientific research and 
methods, general scholarship, and personal and professional experience to develop 
plausible and communicable formulations of local phenomena. This investigator 
draws on scientific theory and research, general world knowledge, acute observa¬ 
tional skills, and an open, skeptical stance toward the problem to conduct this inquiry. 
(Trierweiler & Strieker, 1998, pp. 24-25) 

Thus, in their formulation, Trierweiler and Strieker emphasize both quantitative 
and qualitative methods of inquiry, as well as critical thinking skills, all adapted to the 
local needs and culture of the particular agency within which the psychologist is 
working. Their emphasis is firmly on understanding local phenomena, as opposed 
to producing generalizable knowledge. For them, the local clinical scientist is 
anthropologist, detective, and experimentalist, all rolled into one. While this is clearly 
a tall order, it can be understood as an aspirational goal. 

The scientist-practitioner, applied scientist, and local clinical scientist models all try 
to encompass research and practice within the same person. At their best, these models 
all set up a creative tension between practice and research; at their worst, they set up 
an impossible ideal leading to role strain and, ultimately, cynicism. The final three 
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models, presented next, attempt to resolve the tension by shifting the balance toward 
either practice or research. 


The Evidence-Based Practitioner 

On the one hand, practitioners can be regarded as consumers rather than producers 
of research. The notion of basing one’s practice on the systematic use of research find¬ 
ings has been part of clinical psychology’s ethos since its inception. In the late 1990s 
it found renewed currency within the medical profession. Saclcett, Richardson, 
Rosenberg, and Haynes’s (1997) book, Evidence-Based Medicine, has been influential 
in articulating how this might work tor individual doctors in practice. 

Evidence-based medicine is defined as “the conscientious, explicit, and judicious 
use of current best evidence in making decisions about individual patients” (Saclcett 
et al., 1997, p. 2). Thus, when faced with a difficult clinical decision, whether about 
diagnosis or treatment, doctors are urged to consult the research literature for an 
answer. Saclcett et al.’s book gives rules for how to judge good research. Evidence 
from well-conducted randomized controlled trials and meta-analyses are regarded as 
being especially valuable. 

Within clinical psychology, there is a parallel, contemporary movement to identify 
“empirically supported treatments,” which we mentioned earlier in this chapter. This 
movement has grown out of the understandable need for healthcare purchasers 
(usually insurance companies in the United States; the National Health Service in the 
United Kingdom) to be reassured that they are paying for the most cost-effective 
care. However, there is considerable controversy about whether it is desirable, or even 
possible, to specify preferred treatments in this way, and particularly about the 
methods and standards used to designate certain therapies as “efficacious” or not 
(Kazdin, 2008, Westen, Novotny, & Thompson-Brenner, 2004). 

The evidence-based practitioner model also leaves aside the question of who will 
produce new research findings, an issue addressed by the final two models. 


The Clinical Scientist 

At the opposite extreme from the intuitive practitioner is the clinical scientist model 
(not to be confused with the local clinical scientist model described above), which has 
recently emerged in North America (McFall, 2006). In this model, clinical psycholo¬ 
gists are first and foremost researchers, usually with substantial background in one or 
more areas of experimental psychology, who study clinical phenomena but may not be 
involved in delivering clinical services at all. This is actually not a new model, but has 
only recently been distinguished from the traditional scientist-practitioner model. In 
a way, because of its emphasis on general psychology, this model is the opposite of the 
local clinical scientist. Clinical scientists are portrayed as producers of research that 
evidence-based practitioners can consume. 

This overall division of labor sounds logical, but has several potential drawbacks, all 
of which stem from the loss of the creative tension between research and practice. 
First, clinical scientists may be too divorced from clinical practice to produce research 
that is meaningful to practitioners. Second, in this model, practitioners may 
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be relegated to the role of passive consumers or even technicians. Third, the actual 
relation between research and practice is more often the opposite of that portrayed in 
the clinical scientist or evidence-based practitioner model: innovations are more likely 
to emerge out of clinical practice than out of research (Stiles, 1992). Researchers are 
thus more likely to be consumers of practice (e.g., subjecting clinical hypotheses to 
rigorous tests) than the other way around. 


The Practice-Based Evidence Model 

Finally, the most recendy developed approach is the practice-based evidence model, 
which emphasizes the value of conducting research in working clinical settings. 
Practitioners gather data on their routine clinical practice, which is used to generate evi¬ 
dence about how various interventions are working in the actual clinical environment. 
The model has grown out of a dissatisfaction with the limitations of the randomized 
designs that are relied on by the clinical scientists and evidence-based practitioner 
models, which, as we will see in Chapter 8, emphasize internal validity (the ability to 
infer causality) over external validity (the potential generalizability of the findings). 

The book Practice-Based Evidence by Barlcham, Hardy, and Mellor-Clarlc (2010) 
gives a useful compendium of research methods that might be adopted. Two commonly 
used approaches are “case tracking ”, which uses frequently administered standardized 
measures to monitor client progress, and “ benchmarking ”, in which the outcomes from 
the service in question are compared with good practice from other services. 


Comparison of Models 

The models described above each have a different orientation towards research, and 
emphasize different types of research (see Table 2.1). 

Producing versus Consuming Research 

The models can be ranked in terms of how much they regard the practitioner as a 
producer and as a consumer of research. As we have noted, the scientist-practitioner 
model assumes that the clinician will be producing research (as well as consuming it), 
whereas the evidence-based practitioner model emphasizes only the use of research. 
The applied scientist and local clinical scientist models take a middle position, 
focusing on doing research within a clinical context. The intuitive practitioner does 


Table 2.1 Characteristics of professional models 


Model 

Orientation to research 

Research emphasized 

Intuitive practitioner 
Scientist-practitioner 

Applied scientist 

Local clinical scientist 
Evidence-based practitioner 
Clinical scientist 

Practice-based evidence 

Nonconsumer or indirect consumer 
Producer and consumer 

Integrated with clinical work 
Integrated with clinical work 
Consumer 

Producer 

Integrated with clinical work 

Narrative case studies 
Basic and applied 
Applied small-N 
Evaluation and action 

Controlled trials 
Controlled trials 

Case tracking 
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not produce or consume research, except in the form of case studies. Clinical scientists 
produce research as their main function. 

Type of Research 

The scientist-practitioner model places no restriction upon the type of research that 
psychologists are expected to conduct. The applied scientist model, as its name implies, 
emphasizes applied research, often single case research or at least research using small 
sample sizes (see Chapter 9), while the local clinical scientist model emphasizes evalu¬ 
ation and action research. The evidence-based practitioner and clinical scientist models 
give preference to high-quality randomized controlled trials and meta-analyses. 


Implications for Clinical Training 

With the benefit of hindsight, the scientist-practitioner and the applied scientist 
models appear as ideals that have not been universally adopted or that may not even 
be universally desirable (Hayes et ah, 1999; McFall, 2006). Many psychologists have 
called for a reassessment of the role of research in clinical training. 

Three different training approaches are currendy operating in parallel, at least in the 
English-speaking world. First, many programs continue to adhere to the traditional 
scientist-practitioner, Boulder model. This includes many university-based doctoral 
programs in the United States, which award a PhD, and all training programs in the 
United Kingdom, which award a Doctorate in Clinical Psychology (DClinPsy). 
Second, some U.S. programs in “professional schools” of clinical psychology follow 
the Vail model (Korman, 1976), sometimes known as the practitioner-scholar model, 
adopting a broader definition of research that can be more easily integrated with prac¬ 
tice (Hoshmand & Polkinghorne, 1992; Peterson, 1991). They award the degree 
Doctor of Psychology (PsyD). Finally, some U.S. PhD programs emphasize clinical 
science, downplaying the clinical practicum component of traditional programs in 
favor of rigorous scientific training (McFall, 2006). 

It is unclear at present how these different approaches will play out over time. 
Perhaps there is a need for different training and career routes to suit different personal 
preferences and abilities. As will be seen in the following section, many clinical psy¬ 
chologists do not see science as being a central part of their professional identity, and 
therefore it is questionable whether they need to be trained to be producers, rather 
than consumers, of research. Other individuals may have greater interest or ability in 
the science domain, and may possibly also lack the personal or interpersonal character¬ 
istics needed to become a practitioner. There may be room in the world for a plurality 
of training models to suit a plurality of roles, providing that it is clear to the general 
public what kinds of skills each type of psychologist possesses. 


PERSONAL ISSUES 


Having considered philosophical and professional issues, we now make a transition to 
the personal level. What motivates the individual clinical psychologist to engage in 
research - or not to, as the case may be? 
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• There are several different reasons that individual clinical psychologists 
might have for being involved, or not being involved, in research. 

• Each psychologist will weigh each one differendy. 

• It is important for clinical psychologists to reflect on where research fits into 
their own practice. 


Why Do Clinical Psychologists Do Research? 

We have already mentioned the benefits of conducting research as a systematic way 
of developing knowledge and theory for the profession and science of clinical psy¬ 
chology. There are also a variety of personal reasons why clinical psychologists may 
wish to engage in research. Some of the more common ones are: 

• Curiosity. Research exists to answer questions: it must add something to knowledge 
at the end, otherwise there is no point in doing it. For many researchers, this is an end 
in itself: they want to make sense of the world and see research as a way of doing so. 

• Personal satisfaction. Some psychologists do research purely for the intrinsic satis¬ 
faction. They may enjoy the challenge of research, feel a need to maintain their 
intellectual sharpness (especially in the middle or later stages of their career), 
value the contact it brings with other colleagues, or simply see research as a break 
from everyday routine and a way of reducing occupational stress. There is also the 
satisfaction of seeing one’s work in print and of feeling one is shaping the 
development of one’s profession. 

• Professional and social change. Ideally, research should not just lead to an 
accumulation of knowledge, but also to some change in professional practice, or 
social or legal reforms. Karl Marx’s epitaph puts this point forcefully: “Philosophers 
have interpreted the world, the point, however, is to change it.” Many clinicians 
want to know which interventions work and which do not, and to change their 
practice, or that of their profession, accordingly. Others are disturbed by the 
manifest inequalities in western societies or the level of human suffering in con¬ 
flicted or war-torn parts of the world, and want to make social or political changes 
that will alleviate psychological distress (e.g., Marmot, 2005; Orford, 2008). 

• Competition between professions and theoretical orientations. Similarly, some people 
may be drawn to research as a way of advancing their professional field or favored 
theoretical orientation. Research is a way of legitimizing existing professional prac¬ 
tices and of developing new ones. A large part of applied psychology’s claim to 
professional status is that its procedures were legitimized by research. In marketing 
jargon, one of psychology’s “unique selling points” is that its practitioners possess 
research expertise. 

• Individual career needs. The career structure of one’s employing institution may 
dictate that in order to advance up the hierarchy one must conduct research. 
There are research requirements for clinical psychology students wanting to 
obtain a professional qualification, and a research track record is required for 
appointment to academic positions in the profession. 
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• Institutional demands. In service settings, there is often pressure from management 
to conduct evaluation or other forms of applied research. For example, the recent 
move towards “clinical governance” in the British National Health Service calls 
for practitioners to systematically monitor their treatment outcomes. 


Why Don’t Clinical Psychologists Do Research? 

Although there are many positive reasons for doing research, psychologists at all levels 

of experience also voice several reasons to explain why they do not conduct, or draw 

upon, research (Hayes et al., 1999; Morrow-Bradley & Elliott, 1986): 

• Irrelevance. Research is seen as not saying anything useful about practice. It is seen 
as being overconcerned with rigor at the expense of relevance (i.e., journals are 
filled with rigorous but irrelevant studies). The main source of learning is felt to 
be clinical practice, rather than research studies. 

• Emphasis on generalities. There is a tension between the scientific stance, which 
looks for generalities and lawfulness, and the clinical stance, which stresses human 
individuality. Most research has been done within the nomothetic tradition, which 
emphasizes pooling people together to look for commonalities, rather than the 
idiographic tradition, which emphasizes individual uniqueness (Allport, 1962; see 
also Chapter 9). 

• Mistaken paradigm. The positivist paradigm (see Chapter 4), under which much 
research is conducted, is seen as being reductive and simplistic. This paradigm may 
be linked with macro-political structures, for instance, feminists have critiqued 
the patriarchal nature of traditional psychological research and Marxists have 
critiqued psychology’s emphasis on individualism at the expense of collectivism 
(see Chapter 5). 

• Intrusiveness. Research is seen as a blunt instrument that crushes the phenomenon 
under study. Much as zoologists may kill a butterfly to study it, so, for example, 
the intrusion of research procedures into a therapeutic relationship is felt to 
damage that relationship. For instance, therapists often fear that the act of audio¬ 
recording a session might severely distort the therapy process, for example, by 
making the client apprehensive about confidentiality issues. 

• Time demands. Research is time-consuming and often has a low priority compared 
to service demands. Also, it is often not supported or valued by managers or 
colleagues. 

• Technical expertise. Research is seen as requiring considerable technical expertise, 
with journal editors and other gatekeepers setting prohibitively high standards 
that discourage beginning researchers. 

• Ethical concerns. Research in general is felt to dehumanize participants subdy (or not 
so subtly in some cases) by turning them into “subjects.” In addition, there are eth¬ 
ical problems with some psychological studies, for example, those using deception. 

• Bureaucracy. Conducting a research project usually involves a lot of preliminary 
paperwork and negotiation, in order to gain approval from the host institution’s 
gatekeepers (see Chapter 3) and to obtain ethical or Institutional Review Board 
approval (see Chapter 10). This can often deter researchers from conducting studies. 
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• Bad training experiences. For various reasons, during their training many psychol¬ 
ogists experience research as a painful, alienating process. For example, some may 
feel forced to study something they find uninteresting, or may feel inadequate for 
not being research-oriented. Others fail to receive sufficient direction or support 
for their research, and therefore find it to be a lonely or unmanageable activity. 

• Being scrutinized. Research participants can feel scrutinized, which may arouse 
great anxiety. This may make the conduct of the research project very difficult, 
particularly in evaluation studies, where the continuation of a service may depend 
on the findings. 

• Disturbing conclusions. Research may come up with findings that you do not like. 
It can lead to a painful re-examination of your cherished ideas if they do not 
match up to the facts. It may challenge your assumptions and ways of working, 
and this can be an uncomfortable process. 

The last two reasons have to do with the threatening aspects of research. Sometimes, 
these feelings of threat may not be directly acknowledged, but instead find their 
expression in the form of some of the other reasons listed. For example, practicing 
therapists may, understandably, feel sensitive about being scrutinized, and therefore 
may be reluctant to participate in a project which involves recording their sessions. 
However, they may argue against the project because of intrusiveness for the client, 
rather than admitting their own sense of vulnerability. 


Weighing up the Pros and Cons of Doing Research 

Different individuals will weigh each of the above positive and negative considerations 
differently. Some concentrate entirely on being a practicing clinician and never conduct 
research again once their training is completed. Others concentrate on an academic 
career and do little if any practice. Many, however, take a middle road, combining the 
two activities in their professional work, although perhaps only consuming research 
and not conducting it. We hope to show, in the remainder of this book, that doing 
research need not be a formidable challenge and that it is possible to conduct research 
even if you work primarily in a service setting. We will also offer practical advice aimed 
at making your research experience more interesting and less painful. 


CHAPTER SUMMARY 

In one sense, there is nothing special about doing research, in that the process of 
psychological research is similar to that of open-minded enquiry in everyday life. 
However, research is a disciplined and self-conscious activity that can be quite 
demanding. Clinical psychology considers itself to be scientific, but it is not easy to 
pin down exactly what science consists of. Several philosophers have attempted to 
characterize the essence of scientific progress: Popper, Kuhn, and Feyerabend are 
central figures. Science is a human enterprise, and its development is shaped by social 
and political forces. The scientist-practitioner model, and the closely related applied 
scientist model, are a central part of clinical psychology’s professional ideology. 
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However, there is often a gap between the rhetoric and the reality: many clinical 
psychologists no longer do research once they have qualified. Practicing psychologists 
may choose to do research, or not to, for a variety of personal reasons. This chapter 
has attempted to shed light on the research process from three perspectives: 
philosophical, professional, and personal. It is useful to keep these background issues 
in mind while planning, conducting, and reading research. 


FURTHER READING 

The Oxford Companion to Philosophy (Honderich, 1995) is a magnificent resource for 
anyone needing to know what all those troublesome “-isms” actually mean (and 
indeed for anyone at all curious about Life, the Universe and Everything). Alan 
Chalmers, in his book What is This Thing Called Science? (2013), explains complex 
philosophy of science issues with admirable clarity. However, he draws most of his 
examples from the natural sciences. Slife and Williams (1995) and Proctor and Capaldi 
(2006) review the range of philosophy of science positions, including alternatives to 
the traditional views, as applied to psychology. Since Popper and Kuhn are so often 
referred to, it is worth reading them both in the original. The obvious choice from 
Kuhn’s work is The Structure of Scientific Revolutions (1970). For Popper the choice 
is wider. Perhaps his most accessible work is Conjectures and Refutations (1963), 
especially the chapter of the same title. 

Research in the context of professional practice and clinical training is discussed in 
Benjamin (2005), Hayes et al. (1999), McFall (2006), and Tryon (2007). 


QUESTIONS FOR REFLECTION 

1. Where do you stand in the debate between the realist and constructionist 
approaches to research? 

2. Do you lean more towards basic research or applied research? 

3. Why do you think practitioners often don’t utilize research? Which reasons seem 
most justified? 

4. Which professional model regarding the role of research and practice do you 
identify with most closely? Why? 
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KEY POINTS IN THIS CHAPTER 

This chapter focuses on practical issues in getting a research project started: 

• choosing a topic and formulating the research questions or hypotheses; 

• searching and reviewing the literature; 

• writing a proposal and applying for funding; 

• dealing with the organizational politics of research in clinical settings. 


In sharp contrast to the previous chapter, this one focuses on practical rather than 
theoretical issues. It covers the first stage of the research process, which we label the 
groundwork stage. The researcher’s primary task at this stage is to formulate a set of 
research questions, often including specific hypotheses; the secondary task is to tackle 
organizational or political issues, in order to prepare the ground for data collection. 
Researchers often apply for ethics committee approval and for funding at this stage. 

In practice, as we noted in Chapter 1, the groundwork stage overlaps with the other 
two planning stages: selecting the measures and specifying the design. We are separating 
them here for didactic reasons, but readers who are actually planning a project will need 
to be working concurrently on measurement and design. For example, we cover funding 
issues here, but if you are applying for funding, the grant-giving committee will want a 
full plan of your research. Similarly, when you are applying for ethics approval, you will 
also need to submit a detailed plan for the project. 

Planning the study is usually quite anxiety-provoking, both because you are grappling 
with difficult intellectual and organizational problems, and also because researchers 
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often feel that they are not being productive if they are not collecting or analyzing data. 
You may be tempted to get the planning over and done with as soon as possible. This 
is usually a mistake. We have been involved with many studies, including some of our 
own, which have suffered because of inadequate planning. A poorly planned study can 
cause hours of frustration at the analysis, interpretation, and writing-up stage; at an 
extreme, the results may be worthless because of design faults or poorly thought-out 
research questions. Furthermore, such studies can be confusing to read, as the research 
questions and the research methods are often not fully consistent with one another. 
Time put in at the early stages often repays itself with interest later on, so it is worth 
trying to soothe your anxiety in order to take time over the planning. 

This chapter has two sections. The first considers how the research questions are for¬ 
mulated and specified in a research proposal. The second looks at the politics of research 
in clinical settings and other organizations, in particular how research in such settings 
can easily come to grief. Ethical issues, which often need to be addressed at this stage, 
are covered in Chapter 10, where we discuss the topic of the research participants. 


FORMULATING THE RESEARCH QUESTIONS 


Key points: 

• The process of planning research is painstaking and often anxiety-provoking, 
but effort put in here usually pays off later. 

• Research questions can be either exploratory (open-ended) or confirmatory 
(hypothesis- testing). 

• Various databases can be used to locate relevant literature. 

• Research tools help to make literature reviews more rigorous: systematic 
search and critical appraisal methods, quantitative meta-analysis, and 
qualitative meta-synthesis. 

• The planning process is iterative: feedback from colleagues, and one’s own 
second thoughts, mean that it is usual to take proposals through several 
drafts. 

• Research should generally be question-driven rather than method-driven. 
The essence of planning good research is making the procedures fit the 
questions, rather than the other way around. 


The first step in undertaking a piece of research is obviously to select a topic to 
investigate. By topic we mean a broad area ofinterest, such as “depression in children,” 
“marital communication,” or “something in the area of therapy process.” This is all 
that is needed at the beginning of the project. As the planning progresses, the topic 
will become more and more focused, until you eventually arrive at some specific 
research questions. 

It is valuable to start keeping a personal research journal from the time you begin 
to think about your research questions. In it you can record the development of your 
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ideas as the project progresses. You can also note your thoughts, feelings and general 
reactions to the research process. For example, it is useful at the end of the project to 
have a record of where you’ve come from and why you took the decisions that you 
did (sometimes it’s hard to remember). You might also use the research journal during 
data collection and analysis, especially with qualitative projects, to record your 
thoughts about the meaning of your data. Such “research memos” (Corbin & Strauss, 
2015) can be incorporated into the analysis and write-up of the project. 


Choosing the Topic 

Ideally, the topic will arise naturally out of an intrinsic interest in the area, perhaps 
stimulated by clinical work, reading, or issues that have arisen in your personal life. 
All things being equal, it is better to choose a topic that excites or interests you, as you 
may have to live with it for a long time and the personal interest will help keep you 
going. In the case of PhD research, your choice of topic may influence the direction 
of your subsequent career. 

Sometimes researchers are drawn to topics that touch them personally, as with 
someone who has experienced anorexia doing research on eating disorders. There are 
pros and cons to this. The advantage of researching an issue in which you are personally 
involved is that it gives you the benefit of experiential knowledge (Borkman, 1990) that 
an outside researcher cannot possess. However, there is a danger that your own experi¬ 
ences may lead you to overidentify with the participants. If you are going to conduct 
research on a topic that is close to home, it is important to have some emotional distance 
in order to attain the necessary critical detachment. If you’re in the middle of a divorce, 
you probably want to avoid doing research on marital satisfaction. 

If the research is being done for extrinsic reasons (the prime example being when it 
is undertaken to fulfill a degree requirement), the problem arises of what to do when 
reflecting on your natural curiosities draws a blank, or produces only unrealistic topics. 
Fortunately, there is another broad path to follow: locating current important research 
fronts or “hot topics.” There are several ways of doing this. The first is to talk to a 
colleague or potential supervisor, ideally someone whose work you admire, to see if 
they can suggest a topic. The second is to browse through some current books and 
journals until something takes your fancy. Look at major research reviews for 
recommendations for further research; read Discussion sections of research reports 
for ideas for further study. Another possibility is to choose a topic on practical grounds. 
Is there some research going on in your institution that you could slot into? Is someone 
conducting a large project that you could take a part of? The disadvantages of working 
on a large project are that you may have less sense of ownership and achievement, as 
the study will not be direcdy yours. However, there are many advantages, such as the 
feeling of being part of a larger team, the possibility of having a mentor within the 
project, being able to get assistance with data collection and analysis, and generally 
having the opportunity to discuss ideas throughout the project. The team approach is 
also closer to how research is normally conducted, where a scientist may assemble a 
group of fellow researchers at various levels of seniority from graduate students 
through post-doctoral fellows to faculty in order to form a research group working on 
a common problem or topic area. 
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Developing the Questions 

Having chosen your general topic area, the next step is to narrow it down into specific 
research questions or hypotheses. This step is important to get right, because, as we 
shall often argue, the methods of the research will flow naturally from the questions 
that are asked. Similarly, when you read a research article, the first thing to look for is 
precisely what questions the study is trying to answer. Some papers will clearly state 
their questions or hypotheses, usually at the end of the introduction section; others 
will leave the reader to infer them from the general drift of the paper. It makes it much 
easier for the reader to understand and evaluate the study if its questions are clearly 
spelled out. 

The first step is to formulate a few initial questions that encapsulate what you wish 
to find out from the study. It is a good idea to ask yourself what is the most important 
thing that you intend the project to tell you. Keeping this in mind helps you make 
choices later on if the study starts to become overly complicated. The number and 
complexity of the research questions will depend upon the time scale and the available 
resources. Practitioners doing service-oriented research on their own, or students 
carrying out a time-limited project, will need to ask circumscribed questions. Funded 
research teams undertaking multi-year projects can set themselves much more 
ambitious targets. 

Always bear in mind that the research must be able to teach you something: there 
is no point in conducting a study if it simply confirms what you knew before you 
started it. As we noted in the previous chapter, research should put one’s expectations 
or beliefs at risk, in the sense that it could falsify cherished ideas. It is worth trying to 
“game out” the study (Horowitz, 1982; Patton, 2008): in other words, ask yourself 
what its possible outcomes are, what you would learn from each of them, and which 
ones would make a difference in how you think about, or what you do about, the 
topic you are studying. Good studies yield useful information whatever their outcome, 
even if they fail to confirm initial predictions. 

It is not usually possible to formulate clear and useful research questions at the 
beginning of the planning phase of the study. Specifying the research questions is 
harder than it sounds. You need to pose an initial question, and then refine it by 
reading the literature, consulting with colleagues, deciding what is feasible in your 
intended research setting, considering what is practicable in terms of measurement 
and research design, and conducting pilot studies. This process often takes several 
weeks or months. In their final form, the research questions should be clear and 
concise, so that there is no ambiguity about what the study is aiming to achieve. 

It is important to begin by formulating the research questions in advance of 
developing your procedures. Beginning researchers often rush into selecting the 
measures that they will use before considering clearly what they really want to find 
out. This inevitably muddles their thinking and limits the range of questions they 
consider. For example, we have often seen researchers use measures simply because 
they are easily available and well known (therapeutic alliance springs to mind as an 
example). The essence of planning good research is appropriate methods: making the 
research procedures fit the questions rather than the other way around. In other 
words, the study should generally be question-driven rather than method-driven. 
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Hypothesis-testing versus Exploratory Research Questions 

A hypothesis is a statement of a proposition that will be tested by the research (although 
in practice it can be phrased either as a statement or as a question). It expresses a 
tentative prediction of the results that are expected to emerge. For example, a study 
of the social determinants of child and adolescent behavior problems hypothesized 
that “maternal depression would be positively associated with both externalizing and 
internalizing behavior problems [and that] greater neighborhood social capital would 
be associated with fewer child and adolescent behavior problems” (Delany-Brumsey, 
Mays, & Cochran, 2014, p. 277). 

The advantages of a carefully formulated hypothesis are that it gives an immediate 
clarity and focus to the investigation and it enables you to know immediately whether or 
not the findings of the study support its predictions. It is part of the hypo thetico -dedu ctive 
view of science, which emphasizes the use of theory and previous research to generate 
testable hypotheses (as in Popper’s view of the scientific method: see Chapter 2). Using 
hypotheses also has the merit of increasing precision and fitting in more closely with the 
theory of statistical inference (Howell, 2010). 

On the other hand, stating the research questions in open-ended question form 
allows an exploratory, discovery-oriented approach (e.g., Elliott, 1984; Mahrer, 1988), 
in contrast to the confirmatory approach of the hypothetico-deductive model (see 
Table 3.1). There may not be sufficient theory or previous research to enable you to 
make meaningful hypotheses, or you may not want to constrain your investigation 
early on. What is important is to be clear about what you are trying to investigate. 
Exploratory, discovery-oriented research questions are typically descriptive. 
The research questions that guide exploratory research should be clearly delineated, 
in order to narrow the research topic to a workable size and to provide a central focus 
for data collection and analysis. For example, the question “How do police officers 
experience dealing with traumatic incidents?” would be too broad, but a more focused 
question such as ‘What are police officers’ experiences of supportive and unsupportive 
interactions following traumatic incidents?” would be more workable (see Evans, 
Pistrang, & Billings, 2013). If you take the attitude, “I want to study x, so I’ll just 
collect some data and see what is interesting,” you are likely to end up with an 
incoherent mishmash of findings. 


Table 3.1 Hypothesis-testing and exploratory approaches to research 



Exploratory 

Hypothesis-testing 

Logic 

Scientific context 
Example 1: Recovery 
from alcohol abuse 

Example 2: Internet- 
delivered therapy 

Inductive 

Discovery-oriented 

How does the experience of 
sobriety evolve over the first 
six months? 

How do clients experience 
the therapeutic alliance in 
internet-delivered therapy? 

Hypothetico-deductive 

Confirmatory 

Is the first month of sobriety more 
difficult (in terms of psychological 
symptoms) than the sixth month? 
Do clients rate the therapeutic 
alliance in internet-delivered therapy 
as less strong than in face-to-face 
therapy? 
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Open-ended, discovery-oriented research questions are typically most appropriate 
under the following circumstances: 

• When a research area is relatively new or little is known about it, making it difficult 
or premature to ask more specific questions. 

• When a research area is confusing, contradictory, or not moving forward. This 
may be due to narrow conceptualization or premature quantification prior to ade¬ 
quate open-ended descriptive work. 

• When the topic is a highly complex event, process, or human experience, requiring 
careful definition or description. 

The Role of Theory 

As we noted in Chapter 2, research is always conducted within an explicit or implicit 
theoretical framework. Therefore, it is almost always useful to work at developing that 
framework and making it more explicit. For one thing, conducting your research 
within an explicit theoretical framework will guide the formulation of research 
questions. As the social psychologist Kurt Lewin famously said, “There is nothing so 
practical as a good theory” (Lewin, 1952, p. 169). Thus, it is an excellent idea to 
devote time early in the research process to locating an existing theoretical model or to 
formulating your own working model. You can do this by trying to map out the likely 
relationships between the variables you are studying. For example, if you are studying 
the relationship between therapist empathy and client outcome, you might think about 
what some of the intervening processes might be, as well as variables that might affect 
both empathy and outcome. Therapist empathy might facilitate client self-exploration, 
which might lead to better outcome; or client pretreatment self-awareness might 
facilitate both therapist empathy and client outcome (see Chapter 8). From a different 
theoretical perspective, therapist empathy might act as a reinforcer for client disclosure 
of feelings or statements of positive self-esteem. The theoretical model could then 
guide the selection of specific correlational or comparative research questions, suitable 
for quantitative investigation. 

Even in exploratory research, it is good practice for researchers to try to be as 
explicit as possible about their implicit theories or “preunderstandings” (Packer & 
Addison, 1989), in the form of expectations, hunches, or possible biases. The difference 
is that, in exploratory research, these implicit theories are set aside (referred to as 
“bracketing”: see Chapter 5) rather than explicitly tested. After an exploratory study 
is completed, the researcher may find it useful to compare the actual results to these 
original expectations, in order to determine what has been learned. 


Some Types of Research Question 

There are a number of different types of research question, which are often associated 
with different approaches to research. For example, questions about description often 
lend themselves to discovery-oriented qualitative research, while questions about 
correlation, comparison, or causality usually lead to quantitative methods. Since we 
have not yet discussed these specific procedures, we will not pursue this notion of 
appropriate methods here; we will return to it in subsequent chapters. We present 



Doing the Groundwork 


35 


some common types of questions below (see Creswell, 2009; Horowitz, 1982; and 
Meltzoff, 1998 for more extensive treatments). 

Description 

What is X like? What are its features, characteristics, or variations? How frequent or 
common is it? 


Examples: 

What are patients’ experiences of personal safety on psychiatric inpatient units? 
What do people with bipolar affective disorder find helpful about mutual 
support groups? 

Which verbal response modes are most frequently used by cognitive therapists? 
How common is social anxiety disorder? 


Descriptive questions usually aim to provide a full picture of an experience or 
condition, including its variations. They might focus on its origin and development, 
or seek to provide examples of typical cases. Descriptive questions might also focus on 
amount or frequency, such as in epidemiological survey research. 

Descriptive-comparison 

Does group X differ from group Y? 


Examples: 

Do men and women differ in emotional expressiveness? 

What kinds of interactions occur in families with aggressive boys, compared to 
those with nonaggressive boys? 

Is a history of sexual abuse in childhood more common in bulimic than in 
nonbulimic individuals? 


This type of question extends the simple descriptive question. Note that it does 
not address issues of causality, although causality may be implied, as in the last example 
on bulimia. These questions aim to compare two or more groups of people who are 
defined in terms of some pre-existing differences, such as gender, socioeconomic 
status, or diagnostic category. However, questions addressing differences resulting 
from an experimental intervention (e.g., a therapeutic intervention) come under the 
heading of causality questions, below. 

Correlation 

Do X and Y covary, in other words, is there a relationship between them? Is that 
relationship affected by a third variable Z? 
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Examples: 

Is degree of marital support associated with speed of recovery from depression? 
Is burnout among psychiatric nurses related to their experience of physical assault? 
Does ethnic background affect the relationship between school achievement 
and children’s self-esteem? 


Correlational questions focus on investigating a possible association between two or 
more variables. These associations are often ones that have been predicted on the 
basis of theory or previous research. Although correlational questions may arise from 
an interest in causal explanations, they cannot be used to investigate causality. 

Causality 

Does X cause change in Y? Does X cause more change in Y than does Z? 


Examples: 

Do parent training groups lead to more effective parenting? 

Does family therapy prevent relapse in patients diagnosed with schizophrenia? 
Does taking Ecstasy lead to impairments in memory? 

Is dialectical behavior therapy more effective than treatment as usual for clients 
with borderline personality disorder? 


This type of question goes beyond the descriptive-comparison and correlation questions 
in attempting to look at causal influences. Establishing causal influence usually requires 
some sort of experimental design (Cook & Campbell, 1979; see also Chapters 8 and 
9). Some of these questions are phrased with an explicit comparison; for instance, in 
the final example above, dialectical behavior therapy is explicitly compared to treatment 
as usual. In other questions, the comparison is implicit; in the first example, parent 
training is implicitly compared with no parent training. More elaborate questions can 
also be asked, concerning interactions between variables (Meltzoff, 1998) such as “Is 
dialectical behavior therapy more effective than treatment as usual for male clients with 
borderline personality disorder, but not for female clients?” 

Measurement 

How well (reliably, validly, and usefully) can X be measured by means of measure M? 

These questions often have some overlap with other question types. For example, the 
second question, on the working alliance, is similar to a descriptive-comparison question 
(comparing clients’ views at two or more time points), and the third question, on group 
climate, implies a question about correlation (How well does this measure correlate with 
other similar measures?). However, the distinguishing feature is that these questions 
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Examples: 

Can subtypes of marital conflict be measured reliably and distinguished from 
one another? 

How consistent are clients’ ratings on the Working Alliance Inventory over the 
course of therapy? 

Is the Group Environment Scale a valid way of measuring the climate of multi¬ 
disciplinary team meetings? 


focus on the performance of a specific measurement instrument, or compare different 
ways of measuring a construct, rather relating different constructs to each other. 


Literature Review 

Once the topic area has been chosen, the process of reviewing the literature starts, 
proceeding in parallel and in interaction with the process of formulating the research 
questions and planning the study Before the research proposal is finalized, researchers 
need to immerse themselves in the literature, so that they have read the key papers and 
have a good notion of how the field as a whole is developing. Of course, it is impossible 
to read everything, most fields are too vast to do that, but it is important to have a 
broad sense of being on top of the literature. This is done for several reasons: 

• To assess how well developed the literature is, what kinds of gaps there are in it, 
and whether there has been sufficient preliminary descriptive research to define 
the phenomena of interest. 

• To see how far the existing literature answers the research questions. What can the 
proposed study add to the existing literature? Is there a need for another study? Has 
the study been done before? However, study duplication is rarely a great problem, 
because no two people ever seem to design a study in the same way and because it is 
usually easy to devise variations on something that has already been done before. 

• To help formulate the research questions in the light of theory or previous research, 
and, possibly, to give a theoretical or conceptual framework to work within. 

• To help with measurement and design issues. To see what measures and approaches 
have been used in previous studies, and what are the strengths and weaknesses of 
previous designs. 

Sources of Information 

In established research areas, there may be an enormous amount of literature, which 
can seem daunting to get to grips with. Several information sources can help speed up 
the process of reviewing psychological literature. The following list gives the most 
useful ones at the time of writing (2015), but valuable new sources do tend to spring 
up rapidly, and others fade away equally rapidly. Because things change so fast, the 
URLs printed here may change: we will endeavor to keep them up to date on the 
companion website to this book. 
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• For doing a “scoping search,” that is, for getting acquainted with an unfamiliar 
area at the outset of the study, Google Scholar (http://scholar.google.co.uk/), 
Web of Science (http://wokinfo.com/), and Scopus (http://www.scopus.com/ 
home.url) are invaluable. Google Scholar is freely available; the other two are well 
established, subscription-based databases which are more focused on academic 
journal articles. All three have similar functionality. They enable searches on 
authors, titles, or topic areas. They can list retrieved publications in order of their 
citation count, which gives a rough index of their importance. They also enable a 
citation search, which is useful for identifying papers that have subsequently cited 
a given publication; this will give a sense of current studies in the same line of 
work. (It is also useful for established researchers who want to know who is citing 
their own publications. Aside from giving narcissistic pleasure, this can also be 
useful to see who is active in the same research area.) 

• Discipline-specific databases are useful for doing formal searches, particularly as 
part of a systematic review (see the following section). Four main ones are 
frequently used. PsycINFO (www.apa.org/pubs/databases/psycinfb) is an 
American Psychological Association database that indexes journal articles, books, 
and technical reports. Searches can be conducted by selecting terms from the 
Thesaurus of Psychological Index Terms, or text from tides and abstracts, as well 
as authors’ names. MedLine (http://www.ncbi.nlm.nih.gov/pubmed) is a US 
National Library of Medicine database which indexes journals across the whole 
field of biomedicine, CINAHL (http://www.cinahl.com) focuses on the nursing 
literature, and EMBASE (http://www.elsevier.com/online-tools/embase) is 
more pharmacologically oriented. 

• Handbooks. Current editions of handbooks in the topic area can be extremely 
useful. Examples are: Lambert (2013a) for psychotherapy research; Ayers et al. 
(2007) for health psychology; Woolfe, Strawbridge, Douglas, and Dryden (2010) 
for counseling psychology; Weiner and Otto (2014) for forensic psychology; and 
Gurd, Kischka, and Marshall, (2012) for clinical neuropsychology. The American 
Psychological Assocation publishes a useful series of handbooks (see http:// 
www.apa.org/pubs/books/institutions/handbooks.aspx), although they are 
priced for purchase by institutions rather than individuals. All of these handbooks 
have chapters by expert authors providing comprehensive reviews of focused 
topic areas. 

• Review publications. Journals such as Clinical Psychology Review, Clinical Psychology: 
Science and Practice , and Evidence-Based Mental Health publish review articles on 
major areas of research that are relevant to clinical practice. The Annual Review of 
Clinical Psychology has authoritative reviews of contemporary developments (and 
there are also Annual Reviews in related disciplines, for example, psychology in general, 
public health and sociology). The Cochrane Library (http: ://www. thecochranelibrary.com/ 
view/O/index.html) is another excellent source of authoritative, contemporary 
reviews of clinically relevant topics. 

• Recent books. Library catalogues can be searched by author, title, or subject. 
Also, the low-tech method of just browsing along library shelves, especially new 
acquisitions shelves, can often yield a useful collection of books to help start 
your search. 
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• Current journals. It is worthwhile browsing through the tables of contents of the 

last couple of years’ issues of the three or four most important journals covering 
your topic area. 

Finally, remember that librarians are experts in locating information and can help 
you to find the best sources of reference, or to use sources that you have difficulty 
with. Since new resources are appear regularly, it is very useful to have specialist help 
in your search. 

Systematic Reviews 

A literature review can be a substantial academic enterprise in its own right, which may 
lead to a publishable paper. Many research questions can be answered by summarizing 
the existing literature, rather than by collecting new data. However, a weakness of 
traditional review methods is that they are often unclear about their methods of 
locating and evaluating studies, thereby leaving them open to potential sources of bias. 

Systematic reviews aim to minimize potential bias by being transparent, 
comprehensive, and replicable (Centre for Reviews and Dissemination, 2008; Cochrane 
Library, 2015). In order to avoid the potential problem of reviewers cherry-picking 
studies, they use systematic procedures for the selection of studies for review, and in the 
method of appraising and synthesizing those studies. 

The review should pose a specific question (e.g. “What is the effectiveness of 
internet-delivered psychological therapy for anxiety and depression?”). This question 
then dictates inclusion and exclusion criteria for the studies to be reviewed. 
For treatment-effectiveness studies, reviewers often use the PICOS categories: 
population (e.g., people seeking help for anxiety problems), intervention (e.g., 
internet-delivered cognitive behavioral therapy [CBT]), comparator (e.g., CBT 
delivered face to face), outcomes (e.g., anxiety self-report measures), and setting 
(e.g., community). The criteria can also include methodological issues (e.g. the study 
design or the measures), and publication parameters, in other words, dates, language, 
peer-reviewed journals or the “grey literature” (theses, conference presentations, and 
book chapters). 

The inclusion and exclusion criteria specify the type of studies that you are looking 
for. These criteria are then translated into search terms, which stipulate how the 
studies will be located. They need to specify the databases (PsycINFO, Medline, 
etc.) and the search terms, including the publication type (date, peer-reviewed 
paper, language). You can also specify other (nonelectronic) types of searches such 
as a “hand search” of named key journals or a search of the reference lists of articles 
included in the review. 

The search will produce a (usually large) initial number of hits, which can be then 
scanned online to see if any can be eliminated on the basis of the title and abstract 
(and also to remove duplicates if searching over multiple databases). Once this is done, 
the remaining papers need to be read in full to see which will be included in the final 
body of studies to be reviewed. Two or three researchers usually make the judgments 
about whether studies meet the criteria for review, in order to increase the reliability. 
In the published paper, the various stages of the study selection process are usually 
illustrated by a flowchart. 
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It can be useful to organize your references with a bibliographic database, for 
example, End Note, Reference Manager, Zotero, Mendeley, or (for meta-analyses) 
RevMan. 

Once the set of studies for the review is finalized, their methodological quality (or “risk 
of bias”) needs to be assessed. This is usually done via a “critical appraisal tool,” that is, a 
set of checklists or guidelines for evaluating studies using different types of designs. There 
are many such tools available. A full list is given in the supplementary website to this 
book: some examples are Downs and Black (1998) for studies of interventions, the 
N ewcasde - Ottawa Scale (http: //www. ohri. ca/programs/clinical_epidemiology/ 
oxford.asp) for epidemiological studies and the Critical Appraisal Skills Programme 
(CASP: http://www.casp-ulc.net/) for qualitative studies. 

The body of the review gives a critical appraisal and synthesis of the retrieved 
studies. They can be organized in any way that will make sense to the reader, for 
example, by intervention type or by research design. The review needs to summarize 
the findings of the studies and to give an overall assessment of the strength of the 
evidence, giving greater weight to the more methodologically sound studies. 

The AMSTAR (Assessing the Methodological Quality of Systematic Reviews) 
checklist (see http://amstar.ca/Amstar_Checklist.php) and the PRISMA (Preferred 
Reporting Items for Systematic Reviews and Meta-Analyses) statement (see http:// 
www.prisma-statement.org/index.htm) give detailed guidelines for reporting 
systematic reviews. 

In conclusion, the advantages of systematic reviews are that they are transparent and 
replicable, thereby allowing other researchers to verify their findings. They are 
particularly good for questions concerning the effectiveness of interventions or the 
relationship between two variables; they are less good for reviews addressing more 
theoretical or conceptual issues. Another drawback is that the search strategies are not 
always as robust (or efficient) as they initially seem. They are often difficult to formulate 
and do not always identify all the relevant studies. It is not unusual for reviewers to find 
key studies through less formal routes such as informal conversations with colleagues 
or browsing through journals (Greenhalgh & Peacock, 2005). 

Meta-analysis 

Meta-analysis is a specialized type of systematic review and a form of research in its own 
right. Its name means “analysis of analyses.” It is a sophisticated procedure, which was 
pioneered by Smith and Glass (1977) in a seminal paper analyzing psychotherapy 
outcome studies; subsequendy, it has become the standard method for integrating 
quantitative research findings in all evidence-based disciplines. 

The study selection procedures in meta-analysis are identical to those of systematic 
reviews in general (see the previous section), but the study synthesis uses quantitative 
methods to summarize the overall pattern of the findings. Although its general 
principles are not difficult to understand, conducting a meta-analysis is a technical 
business whose mechanics lie beyond our present scope. For a practical overview we 
recommend Lipsey and Wilson (2001) or Borenstein, Hedges, Higgins, and Rothstein 
(2009), and Cooper, Hedges, and Valentine (2009) or Schmidt and Hunter (2015) 
for more in-depth coverage. In brief, meta-analysis uses an index of the strength of the 
findings in each study (an effect-size measure - see Chapter 12), which is then averaged 
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across all of the studies in the review, yielding a useful summary statistic which 
summarizes the overall result of the review. For example, the average controlled effect 
size for studies of the psychological treatment of social anxiety disorder is 0.77, which 
suggests that psychological therapy is overall a beneficial intervention for this problem 
(Acarturlc, Cuijpers, van Straten, & de Graaf, 2009). 

Meta-analysis can also be used to look at which features of a study are associated 
with specific results. For instance, in the psychotherapy outcome literature, you can 
examine whether studies that use a sample of clients from a university counseling ser¬ 
vice have a different pattern of results from studies that use a sample drawn from a 
clinical population, or whether the results of studies conducted by a team who have an 
allegiance to the therapy differ from those conducted by independent investigators. 

The advantage of meta-analysis over traditional reviewing methods is that it is a 
more powerful way of aggregating the literature and of detecting trends across studies, 
although it has been criticized for giving too much weight to methodologically 
unsound studies or alternatively for re-introducing bias in the study selection process 
(see Lipsey & Wilson, 2001). In recent years it has become the standard method for 
systematically reviewing large bodies of literature, throughout psychology and in 
many other disciplines. 

Meta-synthesis 

Qualitative researchers have recently developed a qualitative analog to meta-analysis, 
often called “meta-synthesis,” or sometimes “meta-ethnography” (Pope, Mays, & 
Popay, 2007; Timulak, 2009). It is essentially a thematic analysis of thematic analyses. 
As is the case with qualitative approaches to analyzing primary data (see Chapter 5), 
there are a variety of approaches, which differ, for example, in how much they locus 
on theory development, their degree of interpretation, and the ways in which they 
integrate the data (Dixon-Woods et al., 2006; Noyes & Lewin, 2011). 

The procedure typically involves using the themes from each individual study as the 
raw data to conduct an overarching thematic analysis (for a description of thematic 
analysis see Chapters 5 and 12). For example, Timulak (2007) analyzed seven 
qualitative studies of client descriptions of the impact of helpful events in therapy. The 
meta-synthesis produced nine overarching themes, including insight, empowerment, 
and feeling understood. 


The Proposal 

As your ideas start to become clearer, it is worth setting them down on paper. This will 
help you show other people what you are planning to do (if you can bear exposing 
your less than perfect ideas to other people) and it will also help you develop your 
thoughts yourself, as it is much easier to re-think something that is down on paper 
rather than just in your head. 

At the very least, prepare a one- to three-page summary of your proposed research 
questions, the theoretical model, and your measures and design. You can use this to 
get some initial feedback, recruit research supervisors (in the United States, doctoral 
committee members) and get early consultations. You can then expand it into a 
longer proposal. 
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Often (e.g., for PhD research and grant applications) a formal research proposal is 
required. This is a useful effort, as the proposal will form the core of the introduction 
and method sections of your final report and will help you sharpen your thinking. It is 
best approached by successive approximations. The first step is to draft a rough 
outline, to get something down on paper, no matter how sketchy. Proposals evolve 
through multiple drafts - five or six is common - as you continue to read, talk, and 
think about the project. 

The structure of the proposal is similar to the introduction and method section of 
a journal article. It should state what the research topic is and why it is important, 
briefly review what has already been done and what psychological theory can be 
brought to bear on the problem, and summarize the intended study and its research 
questions or hypotheses. The method section describes in detail the proposed 
design and measurement aspects of the study. A typical proposal has the following 
structure: 


Outline of a research proposal 

Introduction 

Statement of the research topic and its importance 

Focused literature review (covering previous research and psychological theory) 
Rationale for and overview of the proposed study 
Research questions or hypotheses 
Method 
Participants 
Design 
Measures 

Ethical considerations 

Data analysis procedures 

Expected results and implications (optional) 

Timetable (optional; see below) 

References 

Costings (tor grant proposals) 


You may want to give an estimated timetable for the project in your proposal. Even if 
you do not include one, it is usually helpful at this stage to map one out for your own 
consumption. List each of the major tasks that comprise the project and try to estimate 
how long each one will take and what other tasks need to be completed before it can 
be done. A formal proposal, for example, for a grant application, will usually depict 
this using a Gantt chart. However, one rule of thumb, especially in doctoral research, 
is to double any time estimate: expect everything to take twice as long as you think it 
will (Hodgson & Rollniclc, 1996). In our experience, the most common causes of 
problems in student projects are a slow initial start and unexpected delays later on, 
often out of your control (e.g., ethics committees, access to participants, and data 
collection problems). 
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Possible timetable for a two-year student project 

Month 1 

Start reading the background literature in your general area of 
interest. 

Months 2-4 

Decide on the topic and formulate preliminary research questions. 

Find a supervisor. Continue the literature review. 

Months 5-6 

Draft a one- or two-page proposal. Discuss the project in the 
setting in which you will carry it out. 

Months 7-8 

Apply to your local ethics committee for approval. In the process, 
finalize the research plan and prepare for data collection. 

Month 9 

Begin data collection. 

Month 10 

Write the first draft of the introduction and method sections. 

Months 11-18 

Data collection continues. Re-draft the introduction and method 

sections. 

Month 19 

Finish data collection. Begin data analysis. 

Months 20-21 

Complete the data analysis. Write the first draft of the results and 
discussion sections. 

Months 22-23 

Complete the write-up. Give the final draft to your supervisor for 
comments. Make any advance arrangements for duplication and 
binding. 

Month 24 

Make final corrections. Duplicate, bind, and submit the polished 
version. 


Consultations 

It is a good idea to get a variety of opinions on your proposal from people of different 
backgrounds: for example, colleagues who know the research area, potential research 
supervisors, psychologists outside of your area, and nonpsychologists. No research is 
carried out in isolation: it is always helpful to get input from lay people and from 
colleagues in the scientific community. Even if many of their suggestions cannot be 
implemented, you will often find that something of value emerges each time you present 
your ideas to someone else. 

It is particularly useful to get feedback on your ideas and methods from current or 
former users of the services that you are studying. Their experiential knowledge 
(Borlcman, 1990) can give a valuable perspective on the study. Many users of services 
understandably feel strongly about being marginalized by researchers and decision 
makers, and want to have their voices heard, on the principle of “nothing about us 
without us.” Such service-user involvement can usefully continue beyond the proposal 
stage, throughout the course of the project. Trivedi and Wylces (2002) outline a 
spectrum of options for user involvement in research, ranging from, as the title of 
their paper says, “passive subjects to equal partners.” 

You may want to email some key researchers in the field, to ask for measures, 
reprints, or details of current work. Also, consider attending a conference, as this is 
an excellent way to meet people with similar interests in order to exchange ideas, 
learn about work that has not yet reached the journals, and generally make 
connections. 
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Piloting 

Pilot studies are small-scale try-outs of various aspects of your intended protocol. 
Initial pilots may be done with colleagues or friends role-playing participants. This will 
help you get the administrative procedures roughly right and reveal any gross errors 
in measurement or design. Subsequent pilots can be with people closer to the target 
population that you intend to study. 

The importance of piloting cannot be stressed enough. Just as a jumbo jet is not 
built straight from the drawing board without extensive testing, it is rarely possible to 
design a study in your armchair and then translate it straight into action. You always 
need to test out procedures, measures, and design. Some things that look good on 
paper just do not work in practice: they are not understandable to participants or they 
do not yield useful information. It is also worthwhile performing a few crucial analyses 
on the pilot data to try out coding and analysis procedures and to see whether the 
data can actually be used to answer the research questions. A few hours here can save 
you weeks or months of anguish later on. 


Funding 

It is often possible to get funding for clinical psychology research, if you plan your 
application in advance. The major expense in most psychological research projects is 
research assistant time. Other expenses will be equipment (e.g., for computing or 
recording) and supplies (e.g., printing, photocopying, and postage), although this 
part of the budget will usually be small, in contrast to biomedical research, where 
equipment is often a substantial component of the budget. Finally, there is payment 
to participants, for time and travel expenses. 

The format for grant proposals varies from agency to agency; it is important to 
obtain applicants’ guidelines from potential agencies before starting work on the 
proposal. However, most proposals will follow the broad outline we have discussed 
above, with a final section on the proposed costs and timetable of the research 
(Brooks, 1996). The goal of the proposal is to convince the awarding body that 
you have a well-thought-out plan for innovative and valuable research. The opin¬ 
ions of nonspecialist colleagues can help predict how the agency might react to 
your proposal. 

Grant-giving bodies often employ a multi-stage screening process. Administrative 
staff will first read your proposal to check that it falls within the mission of the funding 
body and that its estimated costs are reasonable. Then it will be sent out to professional 
reviewers, who will be familiar with the area of research that you are proposing. 
They will give it an overall rating and supply a detailed report. These will be consid¬ 
ered by a meeting of the grant-giving committee, who will be professionals in the 
field, though probably not specialists in your area. They will be looking to support 
proposals that demonstrate scientific excellence, have the potential for real-world 
impact, and have a realistic estimate of costs and timetable. 

Specific sources of funds are too numerous and rapidly changing to list here. 
They can be classified into central and local government agencies, charities, busi¬ 
nesses, universities, and health service bodies. Many universities, especially in the 
United States, have officials that can help you identify funding sources for your 
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research. Competition is fierce, and even if your project is turned down, it does not 
mean that it was not worthwhile. It is worth asking to see the referees’ reports, if 
they are available, to identify any weaknesses in your proposal before revising and 
resubmitting it elsewhere. 


THE POLITICS OF RESEARCH IN APPLIED SETTINGS 


Key points: 

• A good relationship between the researcher and the setting is vital to the 
success of the project. 

• A poor relationship can complicate, delay, or thwart the project. 

• Access to research settings is often controlled by official or unofficial 
gatekeepers. 

• It is important to get people in the setting on your side and to respond hon¬ 
estly to their doubts about the research. 


Researchers often underestimate the organizational difficulties of conducting 
research in field, as opposed to laboratory, settings. Obtaining access, especially to 
highly bureaucratic settings such as hospitals, schools, and mental health agencies, 
may require months. It is vital to start doing your groundwork early on, in order to 
establish whether it is viable to do the study in your proposed setting. You need to 
develop a relationship with the gatekeepers and managers, as well as with the clients, 
staff, etc. Although some people will be supportive of your research, others will 
oppose it, not always openly. 


Access 

Negotiating access often requires considerable flexibility, political savvy, and interper¬ 
sonal skills. Many researchers simply avoid the whole process by creating their own 
settings, which they can more thoroughly control (Shadish, Cook, & Campbell, 
2001). However, if you want to study settings outside the laboratory or research 
clinic, access problems are hard to avoid. 

The first step is to identify and approach the gatekeepers of the setting (Taylor & 
Bogdan, 1998), that is, those who control access and help protect it from disruptive 
outside interests. Gatekeepers vary in their apparent social and political power, from 
receptionists who screen out unwanted inquiries, to managers, senior doctors, head 
teachers, or school principals. An initial working knowledge of the role structure and 
formality of the setting gready facilitates the access process and may prevent disasters 
or wasted effort. Cowen and Gesten (1980) recommend starting at the top of the 
setting and working your way down (otherwise, leaders are likely to be insulted or 
suspicious and refuse permission on general principles). They also note that newer 
programs tend to be less formal and more flexible. It is generally useful to have a prior 
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association with the setting or the support of someone who is trusted (Shadish et al., 
2001; Taylor & Bogdan, 1998). 

If you have not already, it is important to begin your research journal at this point 
and, in addition, qualitative researchers may start keeping detailed field notes. 
The access process often yields essential information about how the setting functions 
and how it protects itself, which is worth describing in its own right. 

The next step is to present your request to conduct research in the setting. Be clear 
about what you are proposing to do. It often helps to avoid misinterpretation if you 
put things in writing, stating the general aim of the research and how the data will be 
collected. This is an adaptation of the brief proposal that we suggested above, in 
everyday, jargon-free language; you might do it as a kind of press release, such as 
would be given out to a local newspaper. A draft participant information sheet and 
informed consent form (see Chapter 10) is often needed. It is advisable to make your 
own presentations to the administration or staff, rather than giving in to the tempta¬ 
tion to let someone else do it for you, as they will often forget or do a poor job of it 
(Cowen & Gesten, 1980). Presentations to staff meetings should also be supple¬ 
mented with personal meetings, especially with resistant individuals, preferably in 
their own setting rather than yours. 

In addition, there is often a formal screening process, such as a human subjects 
review or research ethics committee. We will address ethical issues in Chapter 10, but 
it is worth anticipating substantial delays, which may be difficult if you are a student 
trying to complete your research within a tight timetable. Delays may occur for two 
reasons. First, ethics committees may meet infrequently or at intervals that do not fit 
in well with your plans. Second, they may raise objections about your research, which 
you will need to respond to (sometimes involving substantial changes to the protocol) 
before you proceed with the study. 


Responding to Doubts 

You often have to work to get people’s goodwill and to get them on your side. They 
may not be convinced by your research topic - people in applied settings may have 
little understanding of psychological research - but at least they should trust you. 
A senior doctor once said to one of us, “I’m allergic to anything beginning with 
psych!”, but he was still willing to cooperate with our project because he trusted us. 

People might oppose your project for rational, practical reasons, as even the best-run 
projects inevitably cause disruption, and some services are constantly being asked permis¬ 
sion to conduct studies. They might also oppose it in order to protect patients, who may 
be in danger of being overresearched, even with adequate informed consent procedures. 
For example, we are aware of projects that have been turned down in services for 
post-traumatic stress disorder because of patients being on the receiving end of too 
much research. There are also instances of researchers who exploit the people in the 
setting by, for example, failing to keep them informed or not acknowledging their 
contribution when the research is published. 

In addition to these rational, practical concerns, research in service settings often 
arouses feelings of threat and suspicion (Hardy, 1993; Weiss, 1972). It can be seen as 
intrusive, critical, and a challenge to the established way of doing things. Be sensitive 
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to such opposition: if you do not listen to and attempt to meet people’s fears at the 
planning stage, the likelihood is that the study will be undermined later on. Often 
these fears may be expressed indirectly: medical and nursing staff may appear overly 
protective of “their” patients, forms may be mysteriously lost, and so on. 

Furthermore, your research may become embroiled in the internal politics of the 
setting. It is important to be aware of existing organizational tensions, as your study 
may be used as part of a power struggle: different factions may gain from seeing it 
carried out or from blocking it (Hardy, 1993). 

Your clinical and consulting skills are often valuable in both understanding and in 
responding sensitively to the doubts of other people about the research. In respond¬ 
ing to their often complex feelings it is important to be open about what you intend 
to do and why. Goodman (1972) describes how he put the client-centered princi¬ 
ples of disclosure, empathy, and acceptance into action in a large community psy¬ 
chology project that evaluated the effects of companionship therapy for emotionally 
troubled boys: 

A careless procedural mistake or two, showing cause for mistrust and generating 
serious community complaint, could close down the project. We therefore sought to 
reduce risks by establishing some global operating principles that would simulta¬ 
neously protect our participants and our research goals. 

Eventually, the principles took the form of a general approach, or a “clinical atti¬ 
tude” toward the community, consistent with the client-centered theory of effective 
therapist-client relationships. That is, we would try to empathize with any complaints 
about us, accept community apprehension and protective activities, and disclose our 
own needs and plans - including the global intervention strategy. ... Sometimes we 
also disclosed the motives for our disclosures (meta-disclosure). Implementing this 
approach took extra time initially, but it brought trust and proved efficient in the 
long run. (Goodman, 1972, p. 2) 

The central issue is what the members of the setting get out of being involved in 
your research. From their point of view, research has a high nuisance factor. You 
need to minimize such nuisances and help them realize any possible gains: possible 
helpful feedback on procedures, the opportunity for patients to talk about their 
concerns, the increased status of being part of an academic setting, and so on. 
In Hardy’s (1993) terms, you need to align the goals of the research with the goals 
of the setting. Where possible, it can be useful to include a staff member as part of 
the research team and to ask the staff to contribute to the design of the research 
(Cowen & Gesten, 1980; Patton, 2008). However, some clinicians will not want 
any involvement, while others will want to be kept closely informed. It is wise to 
provide information to all important staff members: circulate updates, show drafts 
of papers, etc. 

Through these contacts, the researcher, gatekeepers and prospective participants 
engage in a process of negotiation, in order to arrive at an agreement about how the 
research will be conducted. This agreement, which can be formal or informal, makes 
clear what the researcher is asking for, as well as spelling out the researcher’s obliga¬ 
tions in regard to confidentiality, avoiding disruption, feedback of findings, etc. 
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Authorship 

If you are intending to publish the study, it is worth considering authorship 
issues from the outset. In applied settings, such issues can be complicated, and 
occasionally emotionally fraught, because several people may be involved in dif¬ 
ferent ways in the research. This is better tackled sooner rather than later. Senior 
staff sometimes ask to have their name on a paper simply because the research is 
being done in their unit. Unless they have made a significant contribution to the 
research, this is inappropriate and, for psychologists, unethical (American 
Psychological Association, 2010a, 2010b; British Psychological Society, 2011). 
Appreciation for permission to work in a setting should be mentioned in the 
Acknowledgements section of the research paper. (We discuss authorship issues 
further in Chapter 12.) 


CHAPTER SUMMARY 

The first stage of the research process involves doing the groundwork. There are two 
major tasks here: formulating the research questions or hypotheses and resolving any 
organizational or political issues, in order to prepare the ground for data collection. 
Researchers may also need to apply for ethics committee approval and for funding at 
this stage. 

Planning research begins with developing a set of research questions for which you 
would like to find answers. After this, you can consider what type each question is and 
the appropriate method that goes along with it. As you progress through the ground¬ 
work phase of your research, you are likely to revise your questions substantially. 
Specifying useful, intellectually coherent, and realistic research questions is an iterative 
process that involves systematically appraising the existing literature, consulting with 
expert colleagues, and conducting small-scale pilot studies. The research questions 
can be formulated from either an exploratory, discovery-oriented or a confirmatory, 
hypothesis- testing approach. 

An essential component of planning research is locating and getting to know the 
literature, so that your study can build on what has come before. Several sources of 
information can assist with this process. Literature reviews can range in their degree 
of rigor, from informal reviews to systematic reviews. Meta-analyses and meta¬ 
syntheses are systematic and rigorous ways to appraise a body of quantitative and 
qualitative studies respectively, and may be publishable studies in their own right. 

Successfully dealing with the organizational politics of the research setting requires 
considerable skill. A good relationship between the researcher and the setting is vital 
to the success of the project, whereas a poor relationship can complicate, delay, or 
thwart the project. Access to research settings is often controlled by official or unof¬ 
ficial gatekeepers, with whom the researcher must negotiate at the start of the project. 
People in applied settings often have justified reservations about having research con¬ 
ducted in their organizations. It is important to get people on your side and to 
respond honestly to their doubts. 



Doing the Groundwork 


49 


FURTHER READING 

Hodgson and Rollnick’s (1996) amusing chapter entided “More fun, less stress: How 
to survive in research” is well worth reading, especially their tongue-in-cheek laws of 
research (sample: “A research project will change twice in the middle”); there is a 
contemporary summary on Dorothy Bishop’s (2011) blog. Rudestam and Newton’s 
(2007) book Surviving Tour Dissertation is especially useful for students and has 
some good material on planning and writing. 

Cooper et al. (2009) and Pope et al. (2007) are useful general references for review¬ 
ing and synthesizing studies. Lipsey and Wilson (2001) is a good practical guide to 
meta-analysis. For qualitative meta-synthesis, it is worth consulting the Cochrane 
Collaboration material (Noyes & Lewin, 2011), which is freely available online. 

Several classic texts have material on the politics of working in field settings, for 
example, from the point of view of evaluation research (Weiss, 1972), participant 
observation (Taylor & Bogdan, 1998), and experimental and quasi-experimental 
designs (Shadish et al., 2002). 


QUESTIONS FOR REFLECTION 

1. What research topics are you contemplating? Why? 

2. Construct one or two research questions on your topic area. Look at each one and 
decide whether it is exploratory or hypothesis-testing. Now, take each question 
and try writing it as the opposite kind (i.e., if hypothesis-testing, make an 
exploratory version; if exploratory, make a hypothesis-testing version). What 
would the implications be of shifting your research question one way or the other? 

3. What types of research question have been most common in research on your 
topic? What types have been neglected? 

4. Which theoretical frameworks could you draw upon in order to clarify your ideas? 

5. Think about a setting that you might need to gain access to in order to do your 
research: Who are the likely gatekeepers? What stake do they have in the setting? 
What are their concerns likely to be? 
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KEY POINTS IN THIS CHAPTER 

• How to measure a psychological construct depends on the research questions, 
the theoretical framework, and the available resources. 

• Measures can be classified into self-report and observation, and also 
according to whether they are qualitative or quantitative. 

• The quantitative approach partly derives front the philosophy of positivism. 

• Psychometric theory is the study of the properties of quantitative measures. 

• Reliability concerns the reproducibility of measurement. 

• Validity assesses the meaning of measurement. 


We have now reached the second of the four stages of the research process, the 
measurement stage. It consists of deciding how to assess each of the psychological 
concepts that are to be studied. Before examining actual methods of measurement 
(which are covered in Chapters 6 and 7), we will first consider the underlying theory 
of psychological measurement. The present chapter will concentrate on the conceptual 
foundations of quantitative approaches; qualitative approaches are covered in 
Chapter 5. (Note that the term measurement is being used somewhat loosely here, to 
cover the process of finding ways of capturing the constructs of interest. Strictly 
speaking, measurement implies numbers, so it is stretching the concept quite far to 
use it to accommodate qualitative approaches as well.) 


Research Methods in Clinical Psychology: An Introduction for Students and Practitioners , 
Third Edition. Chris Barker, Nancy Pistrang, and Robert Elliott. 

© 2016 John Wiley & Sons, Ltd. Published 2016 by John Wiley & Sons, Ltd. 
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Furthermore, as we noted in Chapter 3, developing psychometrically sound 
quantitative measures is an important research activity in itself, especially for new 
research areas. If the absence of adequate measures looks like an obstacle to your 
research, it may mean that the logical first step is measure development research, in 
which case the issues we present in this chapter are at the heart of your study. 

The previous, groundwork, stage will have culminated in the formulation of a set 
of research questions or hypotheses involving various psychological constructs. 

The measurement stage of the project then consists in specifying how each of these 
constructs is to be assessed. For example, the research questions might be about the role 
of social support in psychological adjustment to stressful life events. The researcher then 
needs to decide how to assess social support, how to capture emotional adjustment, and 
what constitutes a stressful life event. There are two separate but interdependent issues 
to be considered: how each construct is defined and how it is measured. 

The boundary between the groundwork and measurement stages (and also the 
design stage, which we cover in Chapter 8) is not, of course, as watertight as we are 
implying. For instance, measurement considerations may shape the formulation of the 
research questions. If an investigator knows there is no valid method for measuring 
something, it will be difficult to study it. It would be fascinating to study the content of 
dreams as they are happening during sleep, but there is presendy no conceivable method 
of doing this. Research about real-time dream images cannot, therefore, be conducted, 
and we can only rely on people’s recall of dreams after they wake up. Furthermore, 
some types of measurement may be beyond the time constraints or the financial 
resources of the researcher. For example, in research on the process of family therapy, 
transcribing interviews and training raters to code them is time-consuming and 
expensive. It may, therefore, be inappropriate for a project with little or no funding. 

However, it does not greatly distort the research process to treat groundwork, 
measurement, and design as three separate, sequential stages. For the rest of this 
chapter, we will assume that the research questions have already been decided upon 
(at least for the time being) and that we are now solely concerned with translating 
them into measurement procedures. 

Separating the groundwork and measurement stages is also helpful in beginning to 
think about the study. Novice researchers often worry prematurely about measurement. 
As we argued in Chapter 3, it is better to think first about what to study and only 
secondarily about how to do it. Ideas about measurement will often flow from clearly 
formulated research questions; measurement will be less problematic if you’ve thought 
through the questions you are asking. 

As in the previous, groundwork, stage, there are several important conceptual 
issues, some of which are controversial. We are including this material to give 
essential background information and in order to help you think more broadly 
about the framework within which research is conducted. The present chapter and 
Chapter 5 cover this conceptual background material, whereas Chapters 6 and 7 
deal with practical issues in selecting or constructing measures. The first section of 
this present chapter defines some key terms and looks at the general process of 
measurement, the second section examines the conceptual foundations of the 
quantitative approaches, and the third addresses psychometric theory, in particular 
the concepts of reliability and validity. 
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THE PROCESS OF MEASUREMENT 

Domains of Variables 

Variables studied in clinical psychology research can be grouped into five general domains: 

• Cognitive: thoughts, attitudes, beliefs, expectations, attributions, memory, 
reasoning, etc. 

• Affective: feelings, emotions, moods, bodily sensations, etc. 

• Behavioral: actions, performance, skill, speech, etc. 

• Biological: physiological and anatomical, for example, heart rate, blood pressure, 
brain activity, immune functioning, etc. 

• Social: acute and chronic stressors, social supports, family functioning, work, etc. 

These variables form the content of psychological research: the research questions will 
have been framed in terms of several of them. However, each must be clearly defined 
and translated into one or more practical measurement methods. 


Measuring Psychological Constructs 

We will use the term “measurement” in a general sense to refer to the process of 
finding indicators for psychological concepts. Virtually all psychological concepts can 
have multiple possible indicators, for example: 

• Phobia: observed avoidance; self-report of fear; physiological measures of 
sympathetic nervous system arousal in presence of phobic stimulus. 

• Pain: self-report of intensity; pain behaviors (flinching, crying, avoidance of pain¬ 
ful stimuli); clinician’s judgment. 

• Perfectionism: semi-structured clinical interview; standardized questionnaires. 

The underlying psychological concept (phobia, pain, perfectionism) is known as a 
construct , the indicator, or way of observing it, is known as a measure of that construct. 
Although this language is associated with quantitative methods, it can usefully be 
applied to qualitative methods as well, though in this case researchers may speak of a 
phenomenon rather than of a construct. 

The process of going from a construct to its associated measure is often called 
operationalization. The construct is latent , that is, it can never be known directly, it 
can only be inferred from its associated measurement operations. 

Operationalization 

Construct -► Measure 

Operationalization is, however, not quite as simple as this diagram implies. As the above 
three examples show, there are often several different ways to operationalize a given 
construct. Which one(s) to use depends on the research questions, the theoretical frame¬ 
work, and on the resources available for the study. A second, related difficulty is that 
no single measure can perfecdy capture a construct. Finally, the relationship between 
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a construct and its measure(s) is not usually unidirectional: how the construct is mea¬ 
sured can shape how we understand or define it. Thus, the whole process of measurement 
is more complex than suggested by the traditional view of operationalization, as there is 
rarely a clear-cut, direct mapping of constructs onto measurement operations. 

In order to facilitate the process of operationalization, the construct may be given 
an operational definition, that is, it may be defined so that it can be easily measured. 
Thus, empathy may be initially conceptualized as “Entering the private perceptual 
world of the other and becoming thoroughly at home in it” (Rogers, 1975), but for 
psychotherapy process research it may be operationally defined as how accurately the 
therapist responds to the client’s expressed feelings, which then leads to its being 
measured using expert ratings of audio-recorded interactions. 

It is not always possible or desirable to develop an operational definition for every 
construct. Earlier generations of researchers were taught the doctrine of operationism 
(Stevens, 1935), which stated that a concept is identical to its measurement operations, 
for example, intelligence is what IQ tests measure. This doctrine was subsequently 
rejected by philosophers, and replaced by the critical realist strategy of converging 
operations (Grace, 2001), which advocates using multiple indicators to measure 
underlying constructs. In the clinical context, we often cannot adequately capture 
many important constructs by our current measures. Social skills may be operational¬ 
ized by such indicators as good eye contact, smiling, etc., but performing only these 
behaviors does not produce socially skilled interactions; rather the reverse, it tends to 
produce people who act like robots (or breakfast TV presenters). In line with the criti¬ 
cal realist position, we are arguing that most psychological constructs are only partially 
captured by their associated measures. We will take up these issues again in the follow¬ 
ing two sections when we discuss positivism and construct validity. 

The operational definition of a construct clearly depends on how it is conceptual¬ 
ized theoretically. For example, two ways of measuring social support - by counting 
the number of people in a person’s social network or by assessing the quality of rela¬ 
tionships within that network - have different implications for what is meant by the 
social support construct itself: whether good social support means several potentially 
supportive relationships available or just a few good ones. This issue is known as the 
theory-dependence of measurement (see also Chapter 2). Any way of measuring a 
concept presupposes a view of what that concept consists of, as it is impossible to have 
pure observations. For qualitative researchers, new knowledge or understanding is 
always based on prior knowledge or understanding, in the form of presumptions or 
expectations; it is impossible to start from a totally blank slate (Willig, 2013). 

A further complication is that the act of measurement often changes the person or 
situation being measured, a phenomenon known as reactivity of measurement. For 
example, people may behave differently if they know that they are being observed, 
and asking a client to monitor the day-to-day frequency of her anxious thoughts may 
in itself affect the frequency of those thoughts. Another example is that merely filling 
out a questionnaire about intention to donate blood has been shown to increase the 
likelihood of subsequent donations (Godin et ah, 2010). Research participants are 
often influenced by their perception of what the researcher is trying to find out, and 
may respond to the demand characteristics of the study (Orne, 1962) when they are 
completing questionnaires. 
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Measurement Sources and Approaches 

Sources of measurement can be categorized into self-report and observation: you can 
either ask people about themselves or look at what they are doing. (Strictly speaking, 
self-report data should be called verbal-report, since it can be gathered from several 
perspectives, for example, the person of interest, a significant other, or a therapist or 
teacher. Similarly, observational data could be gathered from an observer, a 
psychological test, or a physiological measure.) Data may be collected from either 
source using either qualitative or quantitative methods (see Table 4.1). 

The distinction between quantitative and qualitative methods raises a number of 
fundamental epistemological issues and visions of what science is (as discussed in Chapter 2). 
Each method derives from contrasting academic and philosophical traditions. 

Quantitative methods are identified with the so-called “hard science” disciplines, principally 
physics; qualitative methods, with the “soft” social sciences, such as sociology and 
anthropology, and the humanities. In the early decades of the 20th century, many influential 
psychologists felt that the road to academic prestige and legitimacy lay with being considered 
“hard science,” and thus sought to identify psychology with physics and quantitative methods 
(Polkinghorne, 1983). This issue has been a continuing struggle within psychology, with its 
roots in older philosophical traditions (idealism and realism) and early schools of psychology 
(e.g., introspectionism and associationism). The structure of the present chapter and the 
following one reflect this debate. They have a thesis, antithesis, synthesis form: we will attempt 
to set out the underlying issues for each approach, and then suggest in the final section of 
Chapter 5 how they might be integrated in practice. 


Table 4.1 Examples of measures classified by source and approach 



Self-report 

Observation 

Quantitative 

Attitude questionnaires 

Behavioral observation 


Symptom checklists 

Psychological tests of ability 
Physiological measures 

Brain scan 

Qualitative 

Qualitative interviews 

Participant observation 


Diaries, journals 

Projective tests 


FOUNDATIONS OF QUANTITATIVE METHODS 


• Quantitative methods are useful both for precise description and for 
comparison. They fit in well with the hypothetico-deductive view of science. 

• There are well-developed procedures for the analysis of quantitative data. 

• The philosophical position of positivism seeks to model psychology and the 
social sciences on the methods used in the physical sciences. 

• The positivist approach was taken up in psychology in the form of method¬ 
ological behaviorism. 

• Positivism has been heavily critiqued, especially by qualitative researchers. 
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Quantitative methods, by definition, are those that use numbers. The main 
advantages of quantitative measurement are as follows: 

• Using numbers enables greater precision in measurement. There is a well-developed 
theory of reliability and validity to assess measurement errors; this enables researchers 
to know how much confidence to place in their measures. 

• There are well-established statistical methods for analyzing the data. The data can 
be easily summarized, which facilitates communication of the findings. 

• Quantitative measurements facilitate comparison. They allow researchers to get 
the reactions of many people to specific stimuli and to compare responses across 
individuals. 

• Quantitative methods fit in well with hypothetico-deductive approaches. 
Hypothesized relationships between variables can be specified using a mathematical 
model, and the methos of statistical inference can be used to see how well the data 
fit the predictions. 

• Sampling theory can be used to estimate how well the findings generalize beyond 
the sample in the study to the wider population from which the sample was drawn. 

The development of science would have been impossible without quantification. 
The necessity of the ancient Egyptians to preserve the dimensions of their fields after 
the flooding of the Nile led to the development of geometry (Dillte, 1987). If the fields 
could be measured and mapped out, their boundaries could be restored once the waters 
had subsided. However, it was not until the late Renaissance that quantification and 
mathematics began to become an integral part of science. In the 17th century, Newton’s 
laws of motion employed fairly simple algebra to provide a tool of great power and beauty 
that enabled scientists to predict the behavior of falling apples and orbiting planets. 

Positivism 

The success of quantitative methods in the physical sciences, especially that of 
Newtonian mechanics, led to an attempt to extend them into other areas of enquiry. 
The 19th-century philosopher, Auguste Comte articulated the doctrine of positivism 
(which is, despite its name, completely unrelated to positive psychology). This doc¬ 
trine has been much elaborated by succeeding scholars, although these elaborations 
have not always been consistent with each other or with Comte’s original formulation 
(see Bryant, 1985; Bryman, 1988, 2012; McGrath & Johnson, 2003; Proctor & 
Capaldi, 2006), making it difficult to formulate precisely what is meant by positivism. 
However, its three main tenets are usually taken to be: 

1. That scientific attention should be restricted to observable facts. (The word 
“positive” is used here in its older sense of dealing with matters of fact, that is, things 
that people are “positively certain” about.) “Inferred constructs,” such as beliefs or 
motives, have no place in science. This is a version of empiricism (the belief that all 
knowledge is derived from sensory experience). 

2. That the methods of the physical sciences (e.g., quantification, separation into 
independent and dependent variables, and formulation of general laws) should also be 
applied to the social sciences. 

3. That science is objective and value-free. 
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A related 20th-century development was logical positivism. It is associated with the 
Vienna Circle group of philosophers, such as Carnap and Wittgenstein. They sought 
to analyze which propositions have meaning and which do not, and then to restrict 
philosophical discourse to those things that can properly be talked about. Wittgenstein’s 
famous dictum captures the flavor: “What can be said at all can be said clearly, and 
what we cannot talk about we must pass over in silence” (Wittgenstein, 1921/1961: 
3). The logical positivists’ central criterion, that all philosophical statements should 
be reducible to sensory experience, rules out discussion of metaphysical concepts. 
They contend that it is pointless arguing about ideas like the meaning of life, since 
propositions concerning it can never be verified. 

Methodological Behaviorism 

The positivist doctrine was incorporated into psychology in the form of methodolog¬ 
ical behaviorism , whose best-known proponents were Watson and Skinner. They 
sought to restrict psychology to a “science of behavior,” eschewing consideration of 
any “inner variables,” such as cognitions and affects. For instance, Watson urged: 
“Let us limit ourselves to things that can be observed and formulate laws concerning 
only those things” (Watson, 1931: 6). In other words, methodological behaviorists 
would not say that a rat was hungry, as hunger is an inferred construct; instead, they 
would say it had been deprived of food for eight hours, or that it was at ninety percent 
of its initial body weight. 

Similarly, they would not talk about aggression, but rather such specific behaviors 
as kicking, punching, or insulting someone. As we noted in our discussion of 
operationism, the meaning of psychological constructs was limited to the operations 
or procedures used to measure them. 

This attitude was a reaction to the perceived limitations of the introspectionism that 
had preceded it. Introspection consisted of the investigator observing the contents of 
his or her own consciousness and attempting to expound general theories therefrom. 
The virtue of sticking to observable behavior is that it is clear what you are talking 
about and your conclusions can be replicated by other investigators. 

Another important manifestation of methodological behaviorism was found in 
clinical work, in the behavioral assessment movement (e.g., Goldfried & Kent, 1972). 
This called for clinical assessment to be tied closely to observable behavior, to remove 
the inferences that clinicians make when, for example, they label a client as having a 
“hysterical personality.” 

However, the distinction between high-inference and low-inference measures may 
be less useful than it seems at first, since inference must occur sooner or later in the 
research process to give psychological meaning to the data. There is a kind of 
conservation of inference law: the lower the level of inference in the measurement pro¬ 
cess, the higher the level of inference needed to connect the observed data to inter¬ 
esting variables or phenomena. Conversely, high-inference measures often do much of 
the researcher’s work early on in the data collection phase, requiring less inference to 
make sense of the data obtained. For example, a measure of nonverbal behavior in 
family therapy might use a low level of inference, but the researcher may need to make 
further inferences in order to make sense of the data, such as interpreting certain body 
postures as indicating detachment from the family. On the other hand, a measure of 
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transference in psychotherapy might require a high level of inference in rating clients’ 
verbal statements, but no further inferences may be needed to interpret the data. 

Criticisms of Positivism 

The positivist, methodological behaviorist stance has been severely criticized both 
from within and outside of psychology (see, e.g., Bryman, 1988; Koch, 1964; 
McGrath & Johnson, 2003; Rogers, 1985). The central criticism is that, when 
carried through rigorously, it leads to draconian restrictions on what can be studied 
or talked about. Psychological constructs that attempt to capture important 
aspects of experience such as feelings, values, and meanings, are ruled out of court. 
It leads to a sterile and trivial discipline alienated from human experience. Although 
few researchers today adopt a strict methodological behaviorism, some articles in 
mainstream psychological journals still seem to lose sight of the people behind 
the statistics. 

The rise of quantitative methods has also been associated with the rise of 
capitalism. Young (1979) argues that reducing everything to numbers is a mani¬ 
festation of a balance sheet mentality. A brilliant fictional indictment of such a 
mentality was made by Charles Dickens’s 1854 novel Hard Times, which starkly 
depicts the loss of humanity that comes from reducing all transactions into 
quantitative terms. This criticism is still timely, in the light of managed care in the 
United States and the culture of targets, performance indicators, clinical audit, 
and cost-effectiveness in the British National Health Service. Emphasizing easily 
measurable indices of workload often leaves out the less tangible - and arguably 
more important - aspects of quality. 

Conclusions 

In our view, the important message to take from the positivists is the value of being 
explicit about how psychological constructs are measured. It reminds researchers and 
theorists to be conscious of measurement issues, to tie their discourse to potential 
observations, and, when speculating about more abstract constructs, to have an 
awareness of what measurement operations lie behind them. For example, if you are 
attempting to study complex constructs such as defense mechanisms, you need to 
specify what would lead you to conclude that someone is using denial or projective 
identification. Cronbach and Meehl’s (1955) notion of construct validity, which we 
discuss below, is an attempt to place the use of inferred constructs on a sound meth¬ 
odological basis. 

It is worth noting that, although quantification and positivism are often treated as 
equivalent, the stress on quantification is actually only a small part of the positivist 
package, and possibly not even a necessary part. For example, qualitative methods can 
be used purely descriptively, without using inferred constructs. Also, the role of 
quantification in science may have been overstated by the positivists. Schwartz (1992) 
points to examples in the physical and biological sciences, for example, the double 
helix model of DNA, that use mainly descriptive qualitative methods. 

Having described the rationale for quantification, we can now look at the 
underlying theory of measurement, including the important question of how to 
evaluate particular measures. 
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PSYCHOMETRIC THEORY 

Psychometric theory refers to the theory underlying psychological measurement in 
general. In particular, it leads to a framework for evaluating specific measurement 
instruments. Although developed in the context of quantitative measurement, some 
of its ideas can, arguably, be translated into qualitative methods. They are essential for 
all would-be researchers to grasp, whatever approach they ultimately plan to adopt. 


Key points: 

Psychometric theory is the theory of psychological measurement. 

• Classical test theory leads from a simple set of assumptions to useful concepts 
for evaluating specific measurement instruments, notably reliability and 
validity. 

• Reliability refers to the reproducibility of measurement: its principal subtypes 
are test-retest, equivalent forms, internal consistency, and inter-rater 
reliability. 

• Validity assesses the meaning of measurement. It can be divided into 
content, face, criterion, and construct validity. 

• Utility refers to how easy the measure is to administer and to how much 
information it adds. 

• Two alternatives to classical test theory are generalizability theory and item 
response theory. 


Definitions 


Scales of Measurement 

Measurements may have the properties of nominal, ordinal, and interval scales (Stevens, 
1946). Nominal scales c onsist of a set of mutually exclusive categories, with no implicit 
ordering. For example, researchers in a psychology outpatient service might use a sim¬ 
plified diagnostic system consisting of three categories: 1 = anxiety; 2 = depression; 
3 = other. In this case, the numbers are simply labels for the categories: there is no sense 
in which 2 is greater than 1, etc., and thus the diagnostic system forms a nominal scale. 


Scales of measurement: 

• Nominal: no ordering 

• Ordinal: ordering only 

• Interval: ordering plus distance 


An ordinal scale is like a nominal scale but with the additional property of ordered 
categories, that is, it measures a variable along some continuum. For example, 
psychology clients might be rated on a scale of psychosocial impairment, consisting 
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of three categories: 1 = slightly or not at all impaired; 2 = moderately impaired; 3 = 
highly impaired. On this scale, someone with a score of 3 is defined to be more 
impaired than someone with a score of 2, and thus it has ordinal properties. However, 
there is no assumption that the distance between successive ordinal scale points is 
the same, that is, the distance between 1 and 2 is not necessarily the same as that 
between 2 and 3. 

An interval scale is like an ordinal scale with the additional property that the 
distances between successive points are assumed to be equal. For example, the Personal 
Health Questionnaire (PHQ-9: Kroenke, Spitzer, & Williams, 2001), a self-report 
measure of depression, is usually treated as an interval scale. This assumes that the 
increase in severity of depression from a score of 10 to a score of 15 is equivalent to 
the increase in severity from 20 to 25. 

The importance of distinguishing between these types of measurement is that 
different mathematical and statistical methods are recommended to analyze data from 
the different scale types. A scale needs to have interval properties before adding and 
subtracting have any meaning. Thus it makes no sense to report the arithmetic average 
of nominal scale data: the mode must be used instead. Nominal and ordinal scales also 
require nonparametric or distribution-free statistical methods, whereas interval scales 
can potentially be analyzed using standard methods such as the t-test and the analysis 
of variance, provided that the data are normally distributed (Howell, 2010). In actual 
practice, however, the line between ordinal and interval data is blurred. Item-response 
theory analyses (Bond & Fox, 2007; see later in this chapter) have shown that many 
popular measures violate the equal-interval assumption, making them in fact ordinal 
measures, and, furthermore, item-response theory methods can be used to transform 
clearly ordinal measures into true interval measures. 


Type of Measure 

Measures can be either nomothetic or idiographic. Nomothetic measures compare 
individuals with other individuals; most psychological tests and inventories fall into 
this category. The scores on a nomothetic measure can be norm-referenced when 
they have no absolute meaning in themselves, but are simply indicative of how the 
individual stands with respect to the rest of the population. For example, the scores 
on the Wechsler Adult Intelligence Scale (WAIS) are norm-referenced: they are 
constructed in such a way as to have a population mean of 100 and a standard 
deviation of 15. A criterion-referenced measure, on the other hand, compares indi¬ 
viduals against an absolute standard. For example, a typing speed of 40 words per 
minute denotes a certain degree of skill at the keyboard; scores on the Global 
Assessment Scale (Endicott, Spitzer, Fleiss, & Cohen, 1976) denote specific levels 
of psychological functioning. 

The contrasting approach, idiographic measurement , focuses solely on a single 
individual, without reference to others. No attempt at comparison is made. Some 
examples of idiographic methods are the Personal Questionnaire (Elliott et al., 2015; 
Phillips, 1986; Shapiro, 1961a), Q-sorts (e.g., Jones, Ghannam, Nigg, & Dyer, 
1993), and repertory grids (Winter, 2003). Such measures are often used within 
small-N research designs, and are discussed further in that context in Chapter 9. 
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Reliability 

How do we go about evaluating specific measures? The two main criteria, reliability 
and validity, are derived from a set of assumptions known as classical test theory 
We will first describe them within that framework, and then reconceptualize them in 
terms of a newer approach, generalizability theory 

The original idea in classical test theory was that in measuring something, one is 
dealing with consistency across repeated measurements. The consistent part of the 
score, the part that is the same across measurements, is known as the true score. It is 
conceived of either as an ideal score or as the mean of an infinitely large set of scores. 
The observed score is the sum of the true score and error, which is conceived of as a 
random fluctuation around the true score. 

Expressed as an equation, this relationship is x = t + e (where .vis the observed score, 
t is the true score, and e is the error); this is called the fundamental equation of 
classical test theory. From this simple equation, and a few assumptions about how the 
error score behaves, the theory of reliability can be constructed. 

Reliability refers to the degree of reproducibility of the measurement. If you repeat 
the measurement in various ways, do you get the same results each time? The more 
consistent the measurement, the greater the reliability and the less error there is to 
interfere with measuring what one wants to measure. It is analogous to the signal to 
noise ratio in electronics. To put it the other way around, unreliability is the amount 
of error in the measurement, mathematically speaking, the proportion of error 
variance in the total score. For example, if you were measuring individuals’ levels of 
paranoid ideation using a questionnaire, you would expect their scores to stay roughly 
stable, at least over short time periods. If people’s scores fluctuated widely over a 
two-weelc interval, the measure would be unreliable and probably not worth using. 

Reasonably high reliability is important because it enables you to measure with 
precision, and therefore allows you to discover relationships between variables that 
would be obscured if too much error were present. At the other extreme, if the 
measurement is totally unreliable, you are simply recording random error, not 
whatever it is you want to measure. 

If you are examining how two variables correlate, the effect of unreliability is to 
attenuate their observed relationship, making their underlying relationship more 
difficult to detect. Any relationship between measures of two variables is a joint 
function of the true underlying relationship of the variables and the weakening 
effect of the unreliability of the measures (Nunnally & Bernstein, 1994). For 
example, if you are studying the correlation between social support and depression, 
and your measures of each of those two constructs are somewhat unreliable, the 
correlation which you obtain may be low, even though the underlying relationship 
between the variables is large. The reliability of your measures can have a huge 
impact on your ability to find what you are looking for, especially with small 
samples. 

However, reliability says nothing about what the measure actually means. It simply 
says that the measurement is repeatable. A thermometer with insufficient liquid in it 
will give very reliable readings, but they will be wrong. The meaning of the measure 
is assessed by its validity, which we will discuss later under a separate heading. 
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Reliability types: 

• Test-retest 

• Equivalent forms 

• Internal consistency (including split-half) 

• Inter-rater 


Methods for assessing reliability depend on the scale of measurement (nominal or 
interval), the type of measure (self-report or observation), and the type of consistency 
that you are interested in. The most common methods are as follows. 

Test-retest reliability examines consistency over time. The measure is administered 
to the same set of people on two or more separate occasions (e.g., a week or a month 
apart). Its test-retest reliability (sometimes called the stability coefficient) is assessed 
by the correlation between the scores from the different time points. There may be a 
problem with practice effects, unless these are uniform across individuals (in which 
case the overall mean would be affected, rather than the correlation). This is the most 
appropriate type of reliability when you are considering change over time, or when 
you are combining repeated measurements to produce a single index (e.g., therapeutic 
alliance over the first three sessions of treatment). 

Equivalent forms reliability is rarely found these days, but may still be encountered 
in test manuals. It examines reliability across different versions of the same instrument. 
It is an extension of test-retest reliability, where instead of readministering the same 
measure on the second occasion, you use an alternate (or “equivalent” or “parallel”) 
form. (Some instruments have a Form A and a Form B to facilitate this.) Again, the 
reliability coefficient is the correlation between the scores on the two administrations. 

Internal consistency is the standard way of assessing the inter-item reliability of a scale 
that is composed of multiple similar items (many self-report measures fall into this cat¬ 
egory). The assumption is that the items are equivalent or parallel, that is, that they all 
aim to tap the same underlying construct. For instance, two parallel items on the Client 
Satisfaction Questionnaire (CSQ-8: Farsen, Attldsson, Hargreaves, & Nguyen, 1979), 
a widely used self-report scale assessing clients’ satisfaction with psychological and 
other healthcare services, are “Did you get the kind of service you wanted?” and 
“Overall, in a general sense, how satisfied are you with the service you received?” Even 
though these items ask slightly different questions, they are assumed to be tapping the 
same psychological construct: satisfaction with the service. Internal consistency, 
figuratively speaking, is a way of assessing how well all the items of the scale hang 
together: are they all measuring the same thing (high consistency) or different things 
(low consistency)? Overall scale reliability is estimated from the covariances of all the 
items with each other, typically assessed using Cronbach’s alpha (see below). 

Split-half reliability is now mosdy of historical interest, but, like parallel forms 
reliability, will be found in some test manuals. It is an old form of internal consistency, 
used prior to the development of high-speed computers because it was easier to 
calculate. Split- half reliability was assessed by dividing a measure into two equivalent 
halves (e.g., odd and even items), then correlating the two halves with each other. It 
has been replaced by Cronbach’s alpha. 
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Inter-rater reliability is used for observational rather than self-report measures in 
order to check the reliability of observations. For example, researchers may be inter¬ 
ested in measuring therapist empathy in a therapeutic interaction, or in estimating 
children’s mental ages from their drawings. The researchers making the ratings may 
be referred to as coders, raters, or judges; their inter-rater reliability is the extent to 
which their ratings agree or covary with each other (see the next section for compu¬ 
tational details). There are two separate issues: how good is the rating system as a 
whole and how good are individual raters - for example, should one be dropped? (See 
Chapter 7 for a further discussion of the rating process.) 


Reliability Statistics 

A variety of different statistics are used to measure reliability. The first step is 
to establish which scale of measurement is involved, since this determines the 
reliability statistic. For practical purposes only nominal and interval scales need 
be considered: as noted earlier, ordinal scales can generally be analyzed as if they 
were interval scales. 

Nominal Scales 

As psychologists frequently need to calculate the reliability of nominal scale data, we 
will illustrate the calculations using a simple example. Suppose that two psychologists 
who work in a specialized psychology outpatient service each categorize patients into 
three diagnostic groups - generalized anxiety, phobia, and other - and they want to 
know how similar their judgments are. Since there is no ordering implied in the 
categories, it is a nominal scale. 

The first thing to do with two sets of categorical measurements (e.g., judgments 
across raters, occasions, instruments, or settings) is to display the data in a two-way 
classification table. Table 4.2 gives some possible data from the diagnostic classification 
study with 100 patients. 

The obvious initial thing to do is to calculate the percentage agreement between the 
clinicians. This is computed from the total number of observations in the agreement 
cells of the table (indicated by underlining in the table), divided by the total number 
of observations. In the example, the agreement is (10 + 20 + 20)/100 = 0.50, or 50% 
agreement. 


Table 4.2 Simplified example of a two-way classification table 


Rater 1 


Rater 2 



Generalized 

anxiety 

Phobia 

Other 

Total 

Generalized anxiety 

10 

20 

0 

30 

Phobia 

10 

20 

10 

40 

Other 

0 

10 

20 

30 

Total 

20 

50 

30 

100 
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However, since raters categorizing patients at random would still agree by chance 
part of the time, a way to control for chance agreement is desirable. Cohen’s kappa (k) 
is used to accomplish this (Cohen, 1960). The formula is: 

K=(P.~Pc)/0-Pc) 

where p o is the proportion of agreement observed (i.e., the total of the numbers in the 
agreement cells of the table divided by the grand total), namely, 0.50 in the example 
above. p c stands for the proportion of agreement expected by chance alone. To calcu¬ 
late p c , first calculate the proportion of observations in each row and column by 
dividing each row and column total by the grand total. Then p r is calculated by multi¬ 
plying corresponding row and column proportions by each other and adding the result¬ 
ing numbers together. In the example, ^ r is given by 0.3*0.2 + 0.4*0.5 + 0.3*0.3 = 0.06 
+ 0.20 + 0.09 = 0.35. This means that the level of agreement due to chance alone is 35%. 

Using the above formula for Cohen’s kappa, the corrected agreement statistic is 
therefore k=( 0.50-0.35)/(1-0.35) = 0.23, which is not very good (see Table 4.4). 

With nominal scale data it is further possible to analyze the reliability of any 
particular category within the scale. That is, you can determine which categories have 
good agreement and which do not. This is done by carrying out a series of analyses in 
which you collapse the scale into two categories: the category of interest and all other 
categories combined. In the example above, the researchers might be interested in the 
reliability of the generalized anxiety category. They would then form a smaller, two- 
by-two table, amalgamating the two other categories, and calculate Cohen’s kappa for 
that table (we leave this for the reader to calculate). 

Ordinal and Interval Scales 

With ordinal and interval scale measurements, there are several choices for assessing 
reliability. To begin with, if you are using a cutoff point on an interval scale (e.g., an obser¬ 
vational rating scale for depression), you may turn it into a binary nominal scale (e.g., 
“depressed-nondepressed”). You could then use Cohen’s kappa to compute reliability. 

More commonly, however, the researcher calculates the association between the 
two measurements using Pearson’s correlation coefficient, r. This statistic is usually 
robust enough to use in most applications (Nunnally & Bernstein, 1994). If more 
than two raters are involved, Cronbach’s alpha can be used with raters treated as 
items; a more complicated statistic known as the intraclass correlation can be also used 
(Shrout & Fleiss, 1979). 

In the common situation where a scale is formed from multiple items that are 
averaged or totaled together, Cronbach’s alpha is the standard index of the reliability 
of the pooled observations (overall score across items, pooled judges’ ratings, or obser¬ 
vations pooled over time). The SPSS Reliability procedure (or any other reasonably 
complete statistics software package) can usually be used to perform the computations. 
The reliability of the whole scale will be higher than the average inter-item correlation, 
because adding together multiple measures averages out the errors in each of them. 

Since the internal consistency of a scale increases with the number of items in the 
scale, it is easier to get higher reliabilities with, say, a 24-item scale than with an 8-item 
one. Thus you might want to see how much increasing your scale by various amounts 
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would improve its reliability. The reliability of such combined measurements can be 
calculated using the Spearman-Brown Prophecy Formula (Nunnally & Bernstein, 1994): 

hk = k ( r ii)/( 1 + (^- 1 ) r n) 

where r k . refers to the reliability of the combined measurements; k = the factor by 
which you are increasing the scale (a fraction if you are making it shorter); and r u is 
the original reliability coefficient. (This formula yields the same results as the 
standardized Cronbach alpha statistic.) 

Two examples of common uses of this formula may clarify its application. In the 
first example, suppose that you have an 8-item scale with a reliability of 0.6, and you 
want to know how reliable a 24-item version made up of similar items would be. 
In this case, r n is equal to 0.6 and k is 3 (because the new scale is three times as long 
as the original one). Then the new reliability would be 3*0.6/(1 + 2*0.6) = 0.82. 

In the second example, suppose that you wish to combine 20 parallel items with an 
average intercorrelation of only 0.3 into a scale. Surprisingly, the scale thus formed 
would have an excellent reliability of 0.89 (= 20*0.3/(1 + 19*0.3)), proof that, 
statistically speaking, if you just have enough sow’s ears, it is possible to make quite a 
nice leather purse (even if it isn’t silk!). 

Dimensionality 

The above discussion has assumed that the measure is attempting to assess a single 
construct. If, instead, you suspect that it may be capturing several different dimensions, 
for example, on a psychological symptom checklist like the SCL-90-R (Derogatis, 
1994), then factor analysis should be used to investigate the internal structure of the 
measure. The procedure for this is beyond the scope of the present text: readers 
should consult specialist references (e.g., Floyd & Widaman, 1995; Tabachnik & 
Fidell, 2013). However, Cronbach’s alpha should also be used to assess the internal 
consistency of the resulting subscales. 


Validity 

Validity is a more difficult concept to understand and to assess than reliability. 
The classical definition of validity is “whether the measure measures what it is supposed 
to measure.” For example, does a depression scale actually measure depression, or does 
it measure something else, such as self-esteem or willingness to admit problems? 

In this chapter we are discussing validity of measurement. However, Cook and 
Campbell (1979) have articulated a highly influential, broader conception of validity 
which involves design as well as measurement. We will address this in detail in Chapter 8. 

There is a two-step process in developing and evaluating measures: first you look at 
reliability, then validity. Reliability is a necessary but not sufficient condition for 
validity. To be valid, a measure must first be reliable, otherwise it would consist mainly 
of error. For example, if two expert raters cannot agree on whether transcripts of a 
therapy session show evidence of client denial, then the validity of the denial category 
cannot be established. On the other hand, a measure can be highly reliable but still 
invalid, for example, head girth as a measure of intelligence. 
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Validity may be assessed in several different ways, and a thoroughly researched mea¬ 
sure will report all of them in its manual. 

Content Validity 

Content validity assesses whether the measure adequately covers the different aspects 
of the construct that are specified in its definition. For example, does a self-report 
depression scale have items which capture the components of lowered mood, 
decreased motivation, sleep disturbance, etc.? This is a qualitative judgment: there is 
no such thing as a content validity coefficient. 

Face Validity 

Face validity is similar to content validity and assesses whether the measure looks right 
on the face of it, that is, that it self-evidently measures what it claims to measure. For 
instance, the items of a depression scale should ask about low mood, but not about 
attitudes to authority. The Hogan Empathy Scale (Hogan, 1969) has the item 
“I prefer a shower to a bath”—it is not at all obvious, on the face of it, how this relates 
to empathy. Face validity is usually desirable, but not always so. The Minnesota 
Multiphasic Personality Inventory (MMPI), for instance, has a number of “subtle 
items,” which were designed to make the test more difficult to fake (Weiner, 1948). 

Face validity is partly a public relations concept, to make sure that the scale looks right 
to potential respondents, who will become alienated if it does not appear relevant to the 
purpose at hand. For example, a symptom checklist asking about such abnormal experi¬ 
ences as psychosis or suicide may be inappropriate for research in family practice settings 
because it will put people off. Like content validity, face validity is a qualitative concept: 
there is no face validity coefficient. Face validity, in the sense of “resonance” with the 
reader, is also a key criterion tor evaluating qualitative research findings (see Chapter 5). 

Criterion Validity 

Criterion validity is a central validity consideration. It assesses how well the measure 
correlates with an established criterion or indicator of the construct it is measuring. It 
is divided into concurrent and predictive validity, depending on whether the criterion 
is measured at the same time or later on. For concurrent validity, the scale is correlated 
with a current criterion: a depression scale could be correlated with clinicians’ ratings 
of depression. For predictive validity , the scale is correlated with a future criterion: a 
hopelessness scale could be used to predict future suicide attempts. The validity 
coefficient in both cases is the correlation between the measure and the criterion. 

Note that seeing whether a measure can predict membership of two separate 
criterion groups (e.g., can a depression scale distinguish between depressed and 
nondepressed patients?) also falls under this heading. It is an example of concurrent 
validity, though it is often wrongly referred to as discriminant validity, which is a 
different concept altogether (see below). 

If the measure is being used for diagnostic reasons, it is useful to specify its 
sensitivity and specificity. Sensitivity is an index of how well the measure picks out 
those patients who have the target condition (i.e., how few false negatives there 
are); specificity is an index of how well it avoids picking out those patients who do 
not have the target condition (i.e., how few false positives there are). Thus a depres¬ 
sion scale, with a given cutoff point, would have high sensitivity if it identified 
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Table 4.3 Possible results of a binary diagnostic test for depression 



Actual status 




Depressed 

Non-depressed 

Totals 

Depressed according 
to test 

True positives (9) 

False positives (10) 

Test positive (19) 

Non-depressed 
according to test 

False negatives (1) 

True negatives (80) 

Test negative (81) 

Total 

Actual positives (10) 

Actual negatives (90) 

Grand total (100) 


almost all of the depressed patients in the sample, and high specificity if it did not 
identify any nondepressed people as being depressed. 

This can be depicted in a two-by-two table (see Table 4.3), with columns giving the 
numbers who actually have and do not have the condition (i.e., depression in the 
current example - other examples are medical tests, e.g., for heart disease or cancer). 
The rows give the numbers that the test indicates have and do not have the condition. 

Then the sensitivity = number of true positives / total number of actual positives. 
In the numerical example, this is 9/10 = 0.90. 

The specificity = number of true negatives / total number of actual negatives. In the 
example, this comes to 80/90 = 0.89. 

It is also useful to calculate the positive predictive value of the test, which is the 
proportion of those testing positive who actually have the condition, and the negative 
predictive value , which is the proportion of those testing negative who don’t have the 
condition. In the example, the positive predictive value is 9/19 = 0.47, and the nega¬ 
tive predictive value is 80/81 = 0.99. Thus we can be reasonably sure that a negative 
test result rules out the condition, but much more uncertain (less than 50% sure) that 
a positive test result means that the individual has the condition. This is reflects a 
problem with low base rates (the base rate is 10/90 = 0.11 in the example), initially 
identified by Meehl and Rosen (1955), which means that even a test with good 
sensitivity and specificity, like the hypothetical one in this example, may still not be very 
useful for individual classification if the condition is relatively rare in the population. 

In practice, when one is developing a test and deciding where to place the cutoff points, 
there is a trade-off between sensitivity and specificity. If it is important to avoid false neg¬ 
atives (e.g., in assessing suicidal or homicidal risk) then the cutoff point is lower, which 
results in more false positives (people who are assessed as dangerous that are in fact not), 
but this may be regarded as an acceptable price to pay in order to save lives in the future. 

For criterion validity, the measurement that is being used as the criterion must be 
well established and of unquestionable validity itself. Such criteria are often referred 
to as ‘gold standard’ (a clear application of the correspondence theory of truth). 
In cases where there is no established gold standard criterion, then considerations of 
construct validity are adopted instead. 

Criterion validity is basically about the practical value of the measure: how well it 
performs in predicting the criterion. It is less concerned with the underlying construct 
that the measure is capturing, or the theory that links test and criterion (Strauss & 
Smith, 2009). The Beck Hopelessness Scale may have good predictive validity for 
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suicide attempts, but that does not necessarily imply that it is a good measure of 
hopelessness. To establish whether a measure is actually assessing the psychological 
construct it was intended to assess, considerations of construct validity must be addressed. 

Construct Validity 

Construct validity is a complex consideration. As its name suggests, it examines the 
validity of a construct as well as that of individual methods of measuring that construct, 
which the previous validity types look at (Cronbach & Meehl, 1955). It asks whether 
the pattern of relationships between measures of that construct and measures of other 
constructs is consistent with theoretical expectations: how it fits with what Cronbach 
and Meehl (1955) termed the “nomological net.” Construct validity is established by 
accumulating studies which test predictions about how the construct in question 
should relate to other constructs and measures (Strauss & Smith, 2009). 

In one classical type of construct validity study, the relevant associations are 
displayed in a multitrait-multimethod matrix (Campbell & Fislce, 1959). This is a 
table that sets out the correlations between several ways of measuring several different 
constructs. For example, if a researcher were interested in the construct validity of 
public-speaking anxiety, she might measure it by using, say, two different self-report 
scales, in addition to an observational measure and measures of heart rate and galvanic 
skin response taken while the person is speaking. In addition she would collect 
comparable self-report and observational measures from the same people on different 
constructs, such as IQ, trait anxiety, extraversion, and self-esteem. The multitrait- 
multimethod matrix displays the correlations among all of these variables. 

The matrix reveals the extent to which measures of the construct of interest are 
positively correlated with measures of related constructs ( convergent validity) and 
uncorrelated or weakly correlated with measures of unrelated constructs ( discriminant 
validity). In the above example, all of the different measures of public-speaking anxiety 
would be expected to correlate at least moderately with each other. They would also be 
expected not to correlate significantly with age or IQ, and to correlate only moderately 
with trait anxiety and self-esteem, but more highly with extraversion. 

The multitrait-multimethod matrix also reveals the extent of method variance , the 
tendency of measures of a similar type to correlate together. For example, scores from 
self-report measures are often moderately intercorrelated, even though they were 
designed to assess quite different constructs. This is why it is desirable, where pos¬ 
sible, to use different measurement methods within a study or research program, and 
not to rely on any one viewpoint or type of measure. 


Generalizability Theory 

An alternative to classical test theory is generalizability theory, which was devel¬ 
oped by Cronbach, Gleser, Nanda, & Rajaratnam (1972). It uses a multifactorial 
model rather like analysis of variance (see Shavelson., Webb, & Rowley, 1989; 
Wasserman, Levy, & Lolcen, 2009). It asks, “To which conditions of observation 
can a particular observation be generalized?” or “Of which other situations can a 
measurement be considered to be representative?” It de-emphasizes the concept of 
the true score in favor of the central activity of analyzing sources of variations in 
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Table 4.4 How reliability and validity involve generalizing across measurement facets 


Facet to generalize across 

Traditional psychometric concept 

Observers: across raters, judges 

Inter-rater reliability 

Occasions: across time 

Test-retest reliability 

Predictive validity 

Instruments: across various ways of measuring the same 

Equivalent forms reliability 

thing (including individual items) 

Internal consistency 

Concurrent validity 

Convergent validity 

Settings: across situations (usually going front more to 

Criterion validity 

less controlled situations) 

Convergent validity 


the scores. The theory deliberately blurs the distinction between reliability and 
validity, a distinction that it turns out is not clear-cut even within classical test 
theory (Campbell & Fislce, 1959). 

Generalizability theory assumes that measurement comprises three elements: 
persons, variables, and facets (or conditions) of measurement. Four facets can be 
distinguished: observers, occasions, instruments, and settings. Generalization across 
these facets corresponds to several of the traditional psychometric concepts (see 
Table 4.4). 

In other words, generalizability theory examines the confidence with which you 
can generalize measurements to other observers, occasions, instruments, or set¬ 
tings. If you are developing a test or scale, it is a good idea to define these condi¬ 
tions, and determine generalizability across the desired range. Such an examination 
is referred to as a generalizability study, and is typically set up as a multifactorial 
research design (see Chapter 8) that incorporates each relevant facet as a factor. 
However, even if you do not actually carry out such a study, the conceptual frame¬ 
work of measurement facets is still useful for understanding the factors important to 
your instruments. 

The more qualitative or conceptual forms of validity, content, face, and more com¬ 
plex forms of construct validity, do not fit neatly into the generalizability theory 
framework. They can be treated separately, or could be considered as aspects of a fifth 
facet, level of abstraction: generalization from the specific working definition of the 
variable to other representations—theoretical, empirical, and phenomenological— 
inferred on the basis of what it is theorized not to be as well as what it is related to. 


Item Response Theory 

A second alternative to classical test theory is item-response theory and, like general¬ 
izability theory, it is fairly complicated mathematically. It grew out of dissatisfaction 
with the limits of traditional test theory and was originally developed for tests of 
knowledge or ability, although it can be applied more generally. In particular, item- 
response theory models attempt to repair problems with the unequal intervals and 
lack of discriminability that occur with most rating scales. For example, the five-point 
Likert scales used on common measures of psychological distress typically have serious 
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scaling problems, including violations of the equal interval assumption and a failure to 
discriminate between adjacent scale points, such as “moderately” and “quite a bit” 
(Elliott et al., 2006). 

The most straightforward form of item-response theory is Rasch analysis (Bond & 
Fox, 2007), which models a single parameter of item performance, known as diffi¬ 
culty , defined as how much of the variable being measured has to be present in order 
for an informant to endorse a given item at a given level (e.g., at step 3 on a 5-point 
scale). Rasch analysis is geared toward constructing unidimensional measurement 
instruments in which items range from easy to difficult to endorse, arranged in 
uniform steps of equal intervals. It allows researchers to answer a range of questions 
about a measure, such as: 

1. What is the optimal number of rating scale categories for the instrument? 

2. Can we improve the internal reliability of the instruments by dropping misfitting 
items and unnecessary scale points? 

3. How many distinct clinical groups (strata) can be distinguished using the 
instrument? 

4. What measurement gaps exist along the continuum measured by the instrument, 
indicating the need for adding or deleting certain types of items? 

5. For a given sample, what sampling gaps exist along the measured continuum? 

6. What can the ordering of items along the continuum measured tell us about what 
it is actually measuring (construct validity)? 

7. Does the construct have different meanings for different client populations? 

The basic idea is to plot the probability of endorsing to a particular item (e.g., “fear of 
fainting in public”) at a given level (e.g., 3 = “quite a bit”) against levels of the underlying 
latent trait that the test is trying to assess, for example, social anxiety. This graph, which 
has an elongated S-shape, is known as the item-characteristic curve. It demonstrates the 
difficulty level of the item for various levels of the latent construct being measured, and 
is a rapid way to summarize its performance. There are various mathematical models of 
the relationship between the latent construct and the probability of endorsing the item 
at a given level: the one parameter (or Rasch) model, and the two and three parameter 
models. For further details, see Bond and Fox (2007) or Reise and Waller (2009). 

One clinical example of item response theory is Roberson-Nay, Strong, Nay, Beidel, 
and Turner’s (2007) examination of the properties of a short form of the Social Phobia 
and Anxiety Inventory. Detailed analysis of the performance of each item was carried out, 
as a result of which the original 7-point Filcert scale was collapsed to a more efficient 5- 
point scale. Item response theory is also particularly useful for computerized psychological 
testing, as it enables the computer to present items that are determined by the participant’s 
previous performance, thus gready increasing the efficiency of the testing procedure. 

Utility 

In addition to reliability and validity, measures also vary in their utility or practical 
value. Measures which are easy to complete, or take little time to administer or score, 
are more convenient than measures which require more skill and time. Another aspect 
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of utility is the incremental value of the information provided. Does the instrument 
yield information which has not been obtained from other measures and which 
therefore adds something or can be put to good use? 

For example, the utility criterion weighs against using the Thematic Apperception 
Test as a measure of pre-post change in therapy, because it is time-consuming and 
difficult to administer and score - except in circumstances where it provides critical 
information that can be gathered in no other way On the other hand, piling on additional 
easy-to-administer self-report outcome measures may also violate utility considerations 
(in addition to imposing an unacceptable burden on the participants), because such 
measures are typically highly intercorrelated and thus do not add useful information. 


Standards for Reliability and Validity 

Reliability and validity calculations are useful both for off-the-shelf measures and for mea¬ 
sures that you are constructing yourself. The usual practice is to report the reliability of 
new or uncommon measures in the Method section of a research paper. Table 4.5 gives 
some suggested standards for evaluating the reliability and validity of measures. These 
have no logical basis; they are simply rules of thumb that represent current standards in 
the research community (although there are variations between different researchers and 
journals). We have drawn from the recommendations of Kraemer (1981) and Nunnally 
and Bernstein (1994), in addition to our own experience in scale development and 
editorial reviewing (although see Lance, Butts & Michels, 2006, for a cautionary note). 

Statistical significance tests of reliability coefficients are usually irrelevant, since they 
are too lenient, as the null hypothesis of no agreement at all should be easily rejected 
in most cases. What matters is the magnitude of the coefficient, not whether it attains 
statistical significance. 

Generally speaking, the higher the reliability the better. However, it is possible to have 
too much of a good thing. Reliabilities greater than 0.90 may indicate either overkill (i.e., 
too many items or raters) or triviality (selection of superficial but readily ratable variables). 

Values in validity research (i.e., research which attempts to test predicted relation¬ 
ships among constructs) are typically substantially lower than in reliability research 
(i.e., research which attempts to generalize across raters, occasions, or within mea¬ 
sures). In this case, values of 0.70 or higher generally mean that one is really tapping 
reliability instead of validity (i.e., that the two measures are really measuring the same 
thing instead of two different things that are supposed to be related). Validity values 
of 0.50 can be considered good, and 0.30 acceptable, but these recommendations are 
much more tentative, as they depend considerably on the particular application area. 


Table 4.5 Suggested reliability and validity standards 



Reliability 

Validity 

Good 

0.80 

0.50 

Acceptable 

0.70 

0.30 

Marginal 

0.60 

0.20 

Poor 

0.50 

0.10 
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In particular, validity coefficients in epidemiological research tend to run much smaller 
than in clinical or personality research. 


CHAPTER SUMMARY AND CONCLUSIONS 

This chapter has examined the theory and philosophical background of psychological 
measurement, looking at how to conceptualize the measurement process and how to 
evaluate the quality of particular measures. The process of going from an underlying 
construct to the measurement of that construct is known as operationalization. There 
are typically several ways to measure any given construct; how it is done depends on 
the research questions, the theoretical framework, and the available resources. 

The quantitative approach to measurement pardy derives from the philosophical 
position of positivism, which seeks to model psychology and the social sciences on the 
methods used in the physical sciences. The positivist approach has been heavily cri¬ 
tiqued, especially by qualitative researchers. 

The central framework for conceptualizing the properties of quantitative measures 
is known as psychometric theory. Making some simple assumptions within psycho¬ 
metric theory allows us to develop a set of ideas, known as classical test theory, about 
how to evaluate measures. The central concepts are reliability and validity. Reliability 
concerns the reproducibility of measurement: its principal subtypes are test-test reli¬ 
ability, internal consistency, and inter-rater reliability. Validity assesses the meaning of 
measurement. It can be divided into content, face, criterion, and construct validity. 
Reliability is a necessary, but not a sufficient, condition for validity. Finally, there is 
the concept of utility, which asks: How easy is the measure to administer and what 
information does it add? Two promising alternatives to classical test theory are gener- 
alizability theory and item-response theory; however they are both more complicated 
mathematically and in spite of their promise are not yet in widespread use. 

The measurement criteria of reliability, validity, and utility relate to the four episte¬ 
mological truth criteria discussed in Chapter 2. Criterion validity is an instance of the 
correspondence criterion of truth, while construct validity and internal consistency 
are examples of the coherence criterion. Furthermore, inter-rater reliability is an 
example of the consensus criterion, and utility fits the pragmatist criterion. Thus, the 
different principles of quantitative measurement are all part of a “system of inquiry” 
(Polkinghorne, 1983) into the truth of psychological phenomena. 

Considerations of reliability and validity are central to evaluating quantitative 
measures, but whether they can be extended to qualitative methods is still being 
debated. We will address these issues at the end of the next chapter, after examining 
the rationale behind qualitative approaches in general. 


FURTHER READING 

For classical psychometric theory, Nunnally and Bernstein’s (1994) text gives a 
thorough treatment, and Rust and Golombok (2008) cover recent approaches. 
Haynes, Smith, and Hunsley (2011) give a thorough treatment of the issues from a 
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clinical assessment perspective. It is worth becoming acquainted with two classic 
papers in psychometric theory: Cronbach and Meehl (1955) on construct validity and 
Campbell and Fislce (1959) on convergent and discriminant validity. They are both 
difficult to read in their entirety, but dipping into the first few pages of each will 
provide a flavor of the reasoning. Strauss and Smith (2009) covers construct validity, 
examining both historical and contemporary issues. 


QUESTIONS FOR REFLECTION 

1. “Everything that can be expressed in words can be measured quantitatively.” 
Do you agree or not? Why? 

2. What is “measurement error”? If there’s so much of it, why don’t we study what 
it consists of? 

3. How would you go about validating a new measure of one of the central constructs 
in your research area or in your clinical work? 

4. Rasch analysis (the most basic type of item-response theory) is controversial 
within psychology. Why do you think this is? 
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KEY POINTS IN THIS CHAPTER 

• Qualitative research uses language as its raw material. 

• It aims to study people’s thoughts, experiences, feelings, or use of language 
in depth and detail. 

• The main advantage of qualitative methods is that they allow a rich description. 

• They draw upon two main philosophical traditions, phenomenology and 
constructionism, although there is considerable diversity within, and overlap 
between, these traditions. 

• Phenomenologists attempt to understand the person’s perceptions and 
experiences. 

• Constructionists focus on how language is used in social interactions, and 
how discourse is affected by culture, history, and social structure. 

• Qualitative approaches can be grouped into four main families: tiiematic anal¬ 
ysis, narrative approaches, text-based approaches, and etimographic approaches. 

• It is possible to specify criteria for evaluating qualitative research studies. 


The raw material for qualitative research is ordinary language, rather than the num¬ 
bers that form the raw material for quantitative research. The language may be 
obtained in many ways. It may be the participant’s own descriptions of himself or 
herself, recorded during a qualitative interview. Or it could be words transcribed from 
a conversation, such as that between a client and a therapist during a therapy session. 
Or it could be something printed, such as a newspaper article or the operational policy 
statement of a hospital’s management committee. It could also take the form of the 
researcher’s field notes of the participants’ behavior, as written down after a qualitative 
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observation session. Whatever source it may come from, linguistic data can give the 
researcher rich, deep, and complex information, sometimes referred to as “thick 
description” (Geertz, 1973). These data can be used to understand people’s feelings, 
thoughts, ways of understanding the world, or ways of communicating with others. 

A simplified illustration of the difference between the quantitative and the qualitative 
approach is shown in the differing responses to the question “How are you feeling 
today?” A quantitatively oriented researcher might ask the participant to respond on 
a seven-point scale, ranging from 1 = “Very unhappy” to 7 = “Very happy,” and 
receive an answer of 5, signifying “Somewhat happy.” A qualitative researcher might 
ask the same person the same question, “How are you feeling today?”, but request an 
open-ended answer, which could run something like “Not too bad, although my knee 
is hurting me a litde, and I’ve just had an argument with my partner, which is upset¬ 
ting me. On the other hand, I think I might be up for promotion at work, so I’m 
excited about that.” In other words, the quantitative approach yields data which are 
relatively simple to process, but are limited in depth and hide ambiguities; the 
qualitative approach yields a potentially large quantity of rich, complex data which 
may be difficult and time-consuming to analyze. 

It is worth noting in passing that there is one potential source of confusion over the 
meaning of the word “qualitative,” as it also has a second distinct meaning in research 
terminology. In quantitative research, the term “qualitative data” is used to refer to 
nominal scale data, to distinguish it from ordinal or interval scale data (see Chapter 4 
on psychometric theory). Thus, census categories measuring ethnic background 
(white European, black African, Asian, etc.) may be referred to as a “qualitative” var¬ 
iable (because it has no ordering property), even though the data are analyzed by 
quantitative methods. However, in this book we will reserve the term “qualitative” 
for data that are collected by open-ended questions or by observations that yield 
verbal descriptions. Simple yes-no responses or nominal categories will be considered 
as a form of quantitative data, since they are narrowly delimited. The qualitative- 
quantitative distinction, as we are using it, boils down to whether the data are col¬ 
lected and analyzed as words or numbers (including counts, proportions, multiple 
choice, and yes-no responses). 

However, the difference between quantitative and qualitative approaches to 
research is about much more than the difference between numbers and words; it is 
also about epistemology, the theory of what knowledge consists of (see Chapter 2). 
As we noted in Chapter 4, quantitative research is largely based on the philosophy of 
positivism. Qualitative researchers usually reject positivism, often quite vehemently, 
instead preferring nonrealist naturalistic or interpretative paradigms based on devel¬ 
oping understanding rather than on testing hypotheses (see Bryman, 1988, 2012; 
Lincoln, Lynham, & Guba, 2011; McGrath & Johnson, 2003; Proctor & Capaldi, 
2006). Rather than claiming to study a universal objective reality, as the positivists do, 
qualitative researchers are more interested in examining the personal meanings and 
subjective interpretations of each individual’s reality. 

Furthermore, in qualitative research, not only are the subjective understandings of 
the research participants foregrounded, but so also is the subjectivity of the researcher. 
Qualitative research is essentially a human encounter, and the researchers themselves 
act as the measuring instrument. Therefore, the researchers’ beliefs, understandings, 
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and feelings about the research topic will inevitably influence the collection and inter¬ 
pretation of the data. Rather than attempting to eliminate this source of “bias,” as the 
positivists would, many qualitative researchers argue that researcher subjectivity can 
enrich the process of the research and so should be embraced (Gough & Madill, 2012). 

We will describe these epistemological positions more fully below, in the section on 
philosophical background. 

Advantages 

The main advantages of using qualitative methods are: 

• They avoid the simplifications imposed by quantification, since some things 
cannot be easily expressed numerically. That is, they enable more complex aspects 
of experience to be studied and impose fewer restrictions on the data or the under¬ 
lying theoretical models than quantitative approaches. 

• They allow the researcher to address research questions that do not easily lend them¬ 
selves to quantification, such as the nature of individual experiences of a psychological 
condition (e.g., eating disorders) or event (e.g., being a victim of crime). 

• They enable the individual to be studied in depth and detail. 

• The raw data are usually vivid and easy to grasp: good qualitative research reports 
make the participants come alive lor the reader. In general, the reports of 
qualitative studies are often more readable than those of quantitative studies 
(except that some qualitative researchers, especially those with postmodernist or 
existential-phenomenological leanings, tend to write in an impenetrable jargon all 
of their own). 

• Qualitative methods are good for hypothesis generation, and for exploratory, 
discovery-oriented research. They permit a more flexible approach, allowing the 
researcher to modify his or her protocol in mid-stream. The data collection is not 
constrained by pre-existing hypotheses. 

• Qualitative self-report methods usually give more freedom to the participant than 
structured quantitative methods. For example, open-ended questions give inter¬ 
viewees a chance to respond in their own words and in their own way. 

• They can be used to “give voice” to participants, especially those who are disad¬ 
vantaged or socially excluded and whose experiences are rarely represented in 
psychological research. 

• Since the data collection procedures are less constrained, the researchers may end 
up in the interesting position of finding things that they were not originally 
looking for or expecting. 


Historical Background 

Qualitative methods can be traced back to the ancient Greek historians. For example, 
Herodotus, who is often called the father of history, traveled widely in the ancient 
world and recounted in his Histories the stories he had heard from the people he met. 
His successors down the ages recorded their observations of the people that they 
encountered in their travels. These kinds of observations eventually became formal¬ 
ized in the discipline of anthropology. 
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In their modern form, qualitative methods were first used in ethnographic fieldwork 
in the early decades of the 20th century. The founders of cultural anthropology, such 
as Malinowski and Boas, conducted ethnographic observations on cultural groups that 
were remote from their own: Malinowski in the Trobriand Islands in Papua New 
Guinea and Boas with the Kwakiud tribe in the Pacific Northwest Coast of North 
America. They spent many months living with and observing the cultures they were 
studying. In the 1920s and 1930s, sociologists adapted these methods to study subcul¬ 
tures within their own society. For example, the “Chicago school” of sociology tended 
to focus on people at the fringes of society, such as criminals and youth gangs. A classic 
example of this genre is Whyte’s (1943) Street Corner Society, which was based on field¬ 
work with an Italian-American youth gang in Boston, Massachusetts. Ethnographic 
methods started out being used to study the “weird and wonderful” (from a Eurocentric 
viewpoint), for example, Pacific Island tribal cultures, and have been brought pro¬ 
gressively more closely to bear on the investigators’ own culture, culminating in such 
contemporary specialties as medical anthropology, which use anthropological methods 
to study health and illness in our own culture (Helman, 2008). 

Some ethnographic work is located on the rather fuzzy boundary between social 
science and journalism. A good example is Blythe’s (1979) The View in Winter, a 
moving account of people describing how they experience being old. The distinction 
is that journalism seeks to report accurately and produce an engaging story, whereas 
social science brings a body of theory to bear on the subject matter, or seeks to develop 
theory from the data, and it articulates its assumptions and procedures in order to 
enable replication. 

In clinical research, qualitative methods were first used in case histories (see 
Chapter 9), for instance, Breuer and Freud’s (1895/1955) first cases, which began the 
psychoanalytic tradition, and Watson and Rayner’s (1920) study of “Little Albert,” 
which helped establish the behavioral tradition. There is also a tradition of participant 
observation methods in mental health research, though they are more often conducted 
by sociologists than by psychologists: classic examples are Goffman’s (1961) Asylums 
and Rosenhan’s (1973) “Being sane in insane places” study. 

The two main qualitative data collection methods currently used in clinical psy¬ 
chology research are in-depth interviewing (see Chapter 6) and qualitative observation 
(see Chapter 7). There are various different approaches to conceptualizing the proce¬ 
dures and the underlying philosophy of qualitative research, which we will look at in 
the remainder of this chapter. In Chapter 12 we will cover approaches to qualitative 
data analysis. 


PHILOSOPHICAL BACKGROUND 

Qualitative research is unfortunately not immune to the usual kinds of factions that 
bedevil most academic enterprises. Although qualitative researchers are united by 
their wish to move beyond the perceived limitations of the quantitative approach, 
they dispute the underlying epistemology and philosophy of science that characterizes 
their endeavors. Here, we will consider two of the main sets of ideas that underpin 
qualitative research: phenomenology and social constructionism. 
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Phenomenologists attempt to understand the person’s thoughts, feelings, percep¬ 
tions, and interpretations of the world. Social constructionists, and the postmodernists 
with whom they are often allied, look at language as a social product in itself, question¬ 
ing many of the familiar concepts, such as reality, truth, or the person, that are taken 
for granted in other branches of the discipline. Although these two positions draw on 
distinct philosophical sources, they are not mutually exclusive: some approaches to 
qualitative research take ideas from both phenomenology and social constructionism. 


Phenomenology 


Central tenets of phenomenology: 

• The primary objects of study are people’s experiences and perceived meanings; 

• Understanding is the true end of science; 

• Multiple valid perspectives are possible; 

• Individuals’ perceptions of their life-worlds are based on implicit assump¬ 
tions or presuppositions. 


The word “phenomenology” is itself a bit of a mouthful, and much of the underlying 
theory is couched in off-putting jargon. However, phenomenology is simply the study 
of phenomena (singular: phenomenon), and “phenomenon” is simply a fancy word 
for perception (that is, what appears to us). In any case, the essence of phenome¬ 
nology is relatively simple: it is the systematic study of people’s experiences and ways 
of viewing the world. 

Sometimes the approach is known as “phenomenological-hermeneutic,” to stress 
its interpretive aspect. (“Hermeneutic” is a fancy word for interpretive, and can be 
used interchangeably with it.) However, there is a potential source of confusion here, 
as there is a particular brand of phenomenological research known as “Interpretative 
Phenomenological Analysis” (Smith, Flowers, & Larkin, 2009), which we will discuss 
below. (Interpretative is given here in its British spelling, because the approach origi¬ 
nated in the UK.) Here, we will use the term “phenomenological” in its general sense, 
to also encompass phenomenological-hermeneutic methods. 

Phenomenological methods in psychology derive from the phenomenological 
movement in philosophy, which developed in the late 19th and early 20th centuries. 
It, in turn, is descended from the rationalist, idealist philosophical tradition of Plato 
and Kant. Husserl was its founder; Brentano, Heidegger, Merleau-Ponty, and Sartre 
were key figures in its development (Jennings, 1986; Spinelli, 2005). Their ideas were 
introduced into psychology by Giorgi, Laing, May, and others (e.g., Giorgi, 1975; 
Laing, 1959; May, Angel, & Ellenberger, 1958), and were a major influence on the 
client-centered, humanistic, and existential approaches to psychological therapy. 

Assumptions 

We can distinguish four central assumptions of phenomenology. First, perception is 
regarded as the primary psychological activity, since our perceptions give rise to what 
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we do, think, and feel. Because of this, perceived meaning is more important than 
objective reality, facts, or events. 

Second, understanding is regarded as being the true end of science (in contrast, for 
example, to the aim of causal explanation, prediction, and control that more traditional 
hypothetico-deductive approaches espouse). The goal is to produce understandings of 
the person’s experiences and actions in terms of intentions, purposes, and meanings. 

A third key assumption is that of multiple perspectives, also known as “epistemolog¬ 
ical pluralism.” Each person’s perspective has its own validity (i.e., it is how they see 
things); therefore, multiple, differing perspectives are equally valid and of interest for 
study. These multiple perspectives constitute different life-worlds (in German, 
Umwelten ). For example, the same aging oak tree is radically different when perceived 
by the forester, the lost child, the squirrel, or the wood beetle. These life-worlds are 
the object of study for the phenomenologist. 

Fourth, individuals’ perceptions of their life-worlds are based on implicit assump¬ 
tions or presuppositions , which phenomenologists also try to understand. That is, what 
we perceive is built on multiple assumptions about ourselves, others, and the world. 
These assumptions are the taken-for-granted, unquestioned context for our actions 
and perceptions. For example, if an acquaintance greets you with “How are you?”, 
you are not usually expected to give an accurate or detailed answer; in fact, to anyone 
but a close friend, it would seem quite odd to do so - the underlying, taken-for- 
granted assumption is that we respond with a brief, positive answer. Although we 
accept these underlying assumptions, we are not generally aware of them and do not 
question them; they usually only become apparent when someone breaks the unwritten 
rules of social interaction. In other words, they are believed to be part of what every¬ 
body knows that everybody knows, a “world common to all and taken for granted” 
(Garfinlcel, 1967, p. 37). 

A key set of underlying assumptions is known as the “natural attitude” or “mundane 
reason” (Pollner, 1987). This is made up of the unquestioned belief that things are 
what they appear to be, and that all sane persons share the same world. In fact, in 
everyday life it is considered strange or deviant to talk about many of these presuppo¬ 
sitions, so that their very obviousness at the same time hides them or prevents them 
from being noticed. 

Phenomenological researchers use two key processes - bracketing and describing - 
in order to identify, and reduce the influence of, their preconceptions. 

Bracketing is an attempt to set aside one’s assumptions and expectations, as far as is 
humanly possible (Fischer, 2009). However, because one’s underlying assumptions 
are often hidden, it requires a special act of reflection to identify them. This act has 
been described in several different ways. The most common is “bracketing the natural 
attitude” (or “bracketing” for short). It involves a process of stepping back from the 
phenomenon in order to see it as if from the outside, as if we were the proverbial 
observer from Mars. Bracketing involves a special kind of turning away from the 
natural attitude, in which the researcher does not accept a description as a statement 
about the world, but simply as a statement about an experience of the world. 

In the clinical context, bracketing is one aspect of the process of empathy in such 
exploratory, humanistic psychotherapies as person-centered and experiential therapy. 
When a client says that she is “trapped” in a situation, the client-centered therapist is 



Foundations of Qualitative Methods 


79 


not interested in determining whether this is factually the case; what is important is 
that the client feels trapped (Rogers, 1975). In contrast, beginning therapists gener¬ 
ally prefer to stay “within the natural attitude” by trying to talk the client out of such 
presumed irrational beliefs, often questioning the facts of the situation. One impor¬ 
tant component of empathy is letting go of one’s own presuppositions in order to 
understand what the client is trying to say. A similar idea is found in the ideal therapist 
state of “evenly hovering, free-floating attention” referred to in the psychoanalytic 
literature (e.g., Greenson, 1967, p. 100). 

A naive approach to bracketing might be to mentally steel oneself and promise to 
give up one’s biases. However, a more fruitful alternative is to begin by carefully 
reflecting on one’s assumptions. At the beginning of a study, the researcher can con¬ 
duct a thought experiment of carrying out the study in imagination in order to iden¬ 
tify expectations of what it might find. This thought experiment might also be 
repeated at the end of the study in order to identify additional expectations that only 
became clear in the course of the study. These expectations take the place of hypotheses 
in traditional research, but they are not the same. In phenomenological research, 
expectations are not given a place of honor at the end of the introduction, instead, 
they are figuratively locked in a drawer until the study is over. Phenomenological 
research is perhaps most exciting when it uncovers understandings that are unex¬ 
pected or even startling. 

The second step in the empirical phenomenological method is describing. Several 
principles are involved (see, for example, Spinelli, 2005). First, good descriptions 
focus on concrete or specific impressions, as opposed to the abstract or general. 
Second, they avoid evaluative terms such as “good” or “bad” and their many syno¬ 
nyms and euphemisms (e.g., “ineffective,” “helpful”), except where these are part of 
the experience itself. Third, they tend to avoid explanations, particularly early in the 
research. The task is to discover meaning, not invent it. This means that, in interview 
studies, interviewers avoid “why” questions or anything that encourages the infor¬ 
mant to speculate on causes or reasons: such questions encourage intellectualization 
and interfere with the slow, careful process of attending to concrete experience. 

Conclusion 

Phenomenological methods are often congenial to clinicians, since the research aim 
of understanding participants’ experiences has much overlap with the clinical activity 
of exploring a client’s cognitions or feelings. The desire to understand people’s inner 
experience is often what draws psychologists into the profession in the first place. 
The strength of phenomenological approaches is their generation of an in-depth 
understanding of individuals’ experiences. However, as we discuss in the following 
section, some psychologists take a different stance to qualitative work, focusing on 
the person’s discourse rather than on their inner world. 


Social Constructionism 

Social constructionists (constructionists, for short) are interested in how language is 
used to order and manage the world. In contrast to phenomenologists, construction¬ 
ists do not see language as necessarily reflecting the individual’s underlying thoughts 
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and feelings; rather they are interested in how people use language to structure 
things, or to get things done. For example, constructionist researchers have exam¬ 
ined psychiatric diagnostic systems from the point of view of how diagnosis may be 
used by mental health professionals to impose a particular view of the world on 
people’s experience (e.g., Georgaca, 2000; Harper, 1994). 


Central features of social constructionism: 

• Part of the postmodernist and poststructuralist movements; 

• Nonrealist; 

• “Radical pluralism”; 

• Often focuses on language in text or speech; 

• Interested in language as social action; 

• Does not assume that language reflects cognition; 

• Emphasizes the reflexivity (circular nature) of psychological theory. 


The basis of the constructionist position is an opposition to the realist approach to 
social science, in particular as articulated by adherents of positivism (see Chapters 2 
and 4). Social constructionists reject, or at least dispense with, the assumption of an 
underlying, independent reality (Gergen, 1985; Madill, Jordan, & Shirley, 2000; 
Willig, 2013). They may speak in terms of multiple realities - that each individual 
constructs their own personal reality. This rejection of realism is to some extent shared 
by the phenomenologists, although the constructionist position tends to be more 
forcefully expressed and may be more thoroughgoing: phenomenologists do not 
explicidy reject realism, they just accept that different people may have different con¬ 
cepts of what reality is. 

However, as we have mentioned above, there is a diversity of views within many 
qualitative traditions, and social constructionism is no exception. There is a radical 
version of constructionism that completely rejects any notion of reality. Thus Guba 
and Lincoln (1989) write that their constructivist paradigm “denies the existence of 
an objective reality, asserting instead that realities are social constructions of the mind, 
and that there exist as many such constructions as there are individuals (although 
clearly many constructions will be shared)” (p. 43). Such radical constructionists also 
do not wish to “privilege” one worldview over any other. Thus they see traditional 
scientific methods as one possible way of understanding the world, but would not 
necessarily regard them as being any more valid than other systems of belief, such as 
shamanism or astrology. However, they would accept that scientists’ own criteria for 
validity are meaningful within the scientists’ own domain of discourse. 

Social constructionists pay close attention to language, spoken and written. However, 
they analyze language in a different way to researchers working within realist tradi¬ 
tions, who are usually concerned with whether statements expressed in language are 
true or not. Thus, if a psychiatrist says, “This patient is paranoid,” a realist approach 
would be to see whether the statement was accurate - to ask, in other words, whether 
the patient is paranoid or not. In constructionism the focus shifts toward looking at 
how people construct their arguments and what work their constructions do: for 
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example, what rhetorical devices does the psychiatrist use to convince us of the validity 
of her position, that the patient is indeed paranoid (see Harper, 1994)? 

The type of focus on language also distinguishes constructionism from phenome¬ 
nology. Phenomenologists and social constructionists may share the assumption that 
objective reality is not of primary concern. Furthermore, phenomenologists also use 
spoken language, in the form of qualitative interviews, as their primary medium of 
research. However, the phenomenologist is using language to understand the 
thoughts and feelings of the participants - to try to understand their inner world. For 
radical constructionists, this act of understanding, too, is a social construction, leaving 
us with only the process of construction to study, especially as this plays out in lan¬ 
guage use. As Reicher (2000) puts it “language is a form of social action which we use 
in order to create our social world. The focus is on how apparent descriptions serve to 
manage our social relations. Psychological categories such as beliefs, desires, and even 
experience, are only of interest in so far as participants themselves put them to use in 
their discourse” (pp. 3-4). 


Social constructionism versus constructivism—what’s 
the difference? 

The terms “constructionism” and “constructivism” are often used interchange¬ 
ably, but they are not identical. Constructionism usually refers to the view that 
the concepts we use - for example, madness or masculinity - are socially deter¬ 
mined, that is, they don’t refer to an independent reality but may vary across 
cultures or over time. Constructivism is a more psychological concept; it refers 
to the process by which individuals arrive at the constructs they use. One impor¬ 
tant example of constructivist thinking is Kelly’s (1955) personal construct 
theory, which looks at the central constructs each individual uses in order to 
make sense of their world. 

Another example of constructivism can be found in contemporary cognitive 
therapy. Historically, cognitive therapists have viewed the external world as less 
important than how clients make sense of the world. The ancient Stoic philoso¬ 
pher Epictetus is often quoted: “Men are not disturbed by things, but by the 
view which they take of them.” However, as Neimeyer (1993) points out, most 
cognitive therapists follow in the realist tradition, in that thoughts are viewed as 
rational (and therefore healthy) if they correspond to reality. This realist stance is 
similar to that adopted by traditional psychiatry, in which the prime criterion for 
psychosis is loss of contact with reality. However, some contemporary cognitive- 
behavioral therapists have begun to move toward a constructivist position, which 
is, as we have noted, more internally consistent with its philosophical roots. 


The other feature of many constructionist theories is that they stress the reflexivity 
(from “reflection,” a function of mirrors) of psychological theorizing. By this is meant 
that psychologists are doing the theorizing, but the psychologists themselves, as 
human beings, are also the object of the theory. Furthermore, researchers’ values and 
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choices inevitably shape the course of the study. Psychological research is thus a 
circular process, which is what the reflexivity metaphor is attempting to capture. Some 
constructionists maintain that the existence of reflexivity undermines any claims of 
psychology to be an objective science (see Gough & Madill, 2012). 

Postmodernism 

Social constructionism is closely aligned with the body of thought known as postmod¬ 
ernism, or sometimes poststructuralism (the two terms overlap, but are not synony¬ 
mous). This can be difficult territory. One major difficulty is that it is often hard to 
pin down exactly what many of the authors writing in this tradition are actually saying, 
as their prose style is often opaque, and the ratio of useful ideas to verbiage can seem 
frustratingly low. However, postmodern thought is currently fashionable and much 
discussed within several fields of study, so it is important to come to grips with it. 

A second difficulty, however, is that the term “postmodernism” itself is hard to 
define. Literally, it is a contradiction in terms, since its base meaning is “what comes 
after modernism,” where modern means current, up to date, or in fashion. However, 
it makes more sense when one realizes that modernism was an artistic and intellectual 
movement of the early 20 th century, which included modern music, art, architecture, 
and literature, as well as positivist philosophy. When this movement began to become 
dated, the new thinking was labeled post-modernism. Today, postmodernism refers to 
a rather loose collection of ideas that have found expression in a number of different 
fields such as literary theory, sociology, and architecture. The key figures are all 
French: the literary theorist Derrida, the historian Foucault, the psychoanalyst Lacan. 
Some of its central themes are: 

• A rejection of grand theories that provide overarching explanations, such as psy¬ 
choanalysis or Marxism; instead micro or composite theories are favored. This is 
coupled with a questioning of the personal and social interests that lie behind 
scientific theories, particularly where those theories seem to serve the interests of 
those in power (Prilleltenslcy, 1997). 

• An intellectual playfulness , that borrows from many different traditions within the 
same piece of work. For example, postmodernist architecture often quotes from 
earlier traditions, inserting a Gothic turret here, a Georgian window there. This is 
exemplified in the image of the qualitative researcher as “ bricoleurf a sort of 
handy-person who uses whatever is at hand to construct things that are useful but 
not elegant (Levi-Strauss, 1958/1963; McLeod, 2011). 

• A focus on language. Lyotard (1979) borrows Wittgenstein’s phrase, “language 
games,” to capture both the aspect of playfulness and the idea that language is 
governed by rules. 

• The indeterminacy (ambiguity) of language. Post-modernists stress that all com¬ 
munication carries multiple meanings, so that understanding is always an act of 
interpretation, and what one reader makes of a text may differ from another 
reader’s understanding. 

Constructionism is not identical to postmodernism, and it was first articulated 
before postmodernism became popular (e.g., Berger & Luclcmann, 1966). However, 
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the postmodernist viewpoint has been adopted by many social constructionists 
(e.g., Gergen, 1994, 2001), and it provides a useful framework in which to under¬ 
stand their ideas. 

Critiques of Postmodernism 

The postmodernist position (and also the extreme versions of constructionism) have 
been fiercely criticized, in a debate which has generated much heat, and perhaps a 
little light. The main lines of argument are: 

• The lack of interest in people’s mental states is open to the same criticisms that 
were earlier leveled at the methodological behaviorists, who adopted an identical 
stance for completely different reasons (i.e., to become, as they saw it, more 
scientific). A psychology that sidesteps the role of inner experience is severely 
limited, and ultimately presents an alienating view of the person. 

• Likewise, the nonrealist emphasis can make research into an ivory tower exercise. 
Reality is important, especially unpleasant reality. Studying rape, child abuse, 
racism, or genocide purely from a discursive viewpoint can easily seem to 
diminish their importance or even appear to deny their existence or the need to 
prevent them. 

• The underlying model of the person is that of a fragmented, unintegrated self; it 
is not the model that many psychological therapists, who are trying to help their 
clients feel more whole, would endorse. Likewise, the person can be viewed as a 
manipulator of language, whose goal is to manage the impression they make or to 
get others to act in a certain way. This is not an image of human beings that can 
support the enterprise of helping people lead more fulfilling, meaningful, and 
honest lives. 

• The language that postmodernists employ is often riddled with impenetrable 
jargon, which seems designed to convey an impression of erudition and profundity. 
Sokal and Bricmont (1999) exposed the ridiculousness of much postmodernist 
writing, especially its use of scientific metaphors, in their critique, Intellectual 
Impostures. They describe how they managed to publish, in a prestigious journal, a 
spoof article entitled “Transgressing the boundaries: Towards a transformative her¬ 
meneutics of quantum gravity,” consisting mostly of postmodern gobbledygook. 

Conclusion 

In summary, the strong points of the constructionist position are that they remind us 
to look closely and critically at how language is used to construct reality and to accom¬ 
plish practical purposes. What “position” is the speaker or writer trying to adopt, how 
is their language being used to bolster this position, and what is finally achieved by 
this? It also stresses the theory-dependent nature of scientific observation: a view that 
the constructionists share with Popper (see Chapter 2), who reached this position 
from a completely different philosophical standpoint. Finally, it stresses the social 
nature of psychological concepts. Instead of treating a concept such as “racism” or 
“mental illness” as an individual trait, constructionists urge us to look at the term in 
its wider social and political context. 
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FAMILIES OF QUALITATIVE APPROACHES 

There are many different approaches to qualitative research, and it is often hard for 
researchers new to the area to make sense of their similarities and differences. 
Ultimately, the choice of which approach to adopt depends on your specific research 
question(s) and which philosophical tradition you identify with. For didactic purposes, 
we will divide qualitative approaches into four main families, according to the kinds of 
questions they are attempting to address. We will use Pistrang and Barker’s (2012) 
pragmatically based categories: thematic analysis approaches, narrative approaches, 
language-based approaches, and ethnographic approaches. 

Some researchers, however, may regard this classification as simplistic or otherwise 
contentious. For example, Willig (2013) depicts qualitative approaches as being 
arranged on a continuum, ranging from realist to relativist. As in qualitative analysis 
generally, there is no overall best way to organize the material, just various possible 
alternative constructions. 

It is also worth noting that, regardless of the overall classification, there is consid¬ 
erable diversity within, and overlap between, the various qualitative approaches. Also, 
some approaches have different variants under the same label: for example, there is 
more than one version of grounded theory and of discourse analysis. 


Thematic Analysis Approaches 

The first family of approaches is the thematic analysis family. Although we have not 
done a formal audit, it almost certainly covers the most commonly used approaches 
within published clinical psychology research. The characteristic feature of these 
approaches is that they attempt to extract the main themes or concepts that run 
through the body of a data set (which usually consists of qualitative interview tran¬ 
scripts). This process can be thought of as a qualitative analog of the quantitative 
approaches of factor analysis or cluster analysis. 

Content Analysis 

Content analysis (Joffe & Yardley, 2004; Krippendorf, 2013) sits on the boundary 
between quantitative and qualitative methods. It is included here to give an indication 
of the range of possible thematic analysis approaches, rather than to specifically illus¬ 
trate the qualitative approach. Content analysis aims to give the frequencies of the 
important content categories in the data set. It is qualitative in that the raw data for 
the study is qualitative, but it is quantitative in that the output is a frequency count of 
the various themes or codes. For example, the transcript of a therapy session could be 
content-analyzed for different types of client emotional expression, such as anger, 
sadness, anxiety, and so forth. The output would be something like 15 instances of 
anger, 7 of sadness, and 3 of anxiety. Content categories can either be defined before 
the research starts (as would probably be the case in this example) or they can be 
derived post hoc from the data set. Content analysis can also be conducted in an 
automated way. For example, there is computer software to analyze the emotional 
content of therapy sessions; this implements the Gottschallc and Gleser (1969)content 
analysis of therapeutic interaction, yielding categories such as Anxiety, Hostility 
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Outward, and Hostility Inward (see http://www.gb-software.com/develop.htm). 
Another automated method is Pennebaker’s LIWC (Linguistic Inquiry and Word 
Count) computer program (www.liwc.net), which analyzes emotion words and also 
some grammatical properties of the text, such as pronoun usage. 

Framework Approach 

Framework (Ritchie & Spencer, 1994; Ritchie, Lewis, Nicholls, & Ormston, 2014) is 
a structured approach to thematic analysis. It has some similarities to content analysis, 
but it belongs more firmly in the qualitative camp. Its central feature is that the 
researcher develops a detailed coding framework. This is usually done a priori, based 
on theory, previous research, or the questions asked in the interview protocol, but it 
can be developed inductively, based on the data collected. The analytic process is illus¬ 
trated by charts, which show the instances of each code for each participant, thereby 
making the whole process more transparent (although the final step - synthesizing the 
codes - may require interpretation). Framework’s origins are in social policy analysis, 
but it is particularly popular in medical contexts, since it was featured in Pope, 
Ziebland, and Mays’s (2000) influential British Medical Journal paper on ‘analyzing 
qualitative data’. 

Grounded Theory 

Grounded theory is one of the oldest and more widely used qualitative research 
methods. It was developed by two North American medical sociologists, Glaser and 
Strauss, in their 1967 book, The discovery of grounded theory: Strategies for qualitative 
research. As the title suggests, they were attempting to articulate how qualitative data 
could be used not just to provide rich descriptions, but also to generate theory. 
Originally used for participant observation research (see Chapter 7), this approach has 
come to be used with a range of qualitative material, such as semi-structured inter¬ 
views, focus groups, and diaries. 

The term “grounded theory” is potentially confusing, as it refers both to a method - a 
set of systematic procedures for analyzing data - and also to the outcome or product 
of the analysis, which is theory “grounded” in the data. The basic process involves 
identifying categories at a low level of abstraction and then building up to more 
abstract theoretical concepts. The end point is often one or more core categories, 
which capture the essence of the phenomenon (see Chapter 12). This process of anal¬ 
ysis occurs concurrendy with the process of data collection, and the developing theory 
guides the sampling strategy (“theoretical sampling”: see Chapter 10). 

The original Glaser and Strauss (1967) volume was more theoretical and polemical 
than practical; it was aimed at challenging the prevailing quantitative paradigm in 
American sociology. The practical implications for researchers, in other words, the 
steps to be taken in actually carrying out a grounded theory study, are developed in 
Glaser (1998) and Corbin and Strauss (2015). (It is worth noting that Glaser and 
Strauss subsequently disagreed about how grounded theory should be done, and 
their texts have different emphases - see Willig, 2013.) 

The grounded theory approach was taken up by psychologists in the 1980s and 
1990s. Articles by Rennie, Phillips, and Quartaro (1988) and by Henwood and 
Pidgeon (1992) were aimed at introducing it to an audience of psychologists. 
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Rennie and Brewer’s (1987) study entitled “A grounded theory of thesis blocking” 
(i.e., writer’s block among research students) may well be of personal interest to 
some readers of this text. As more psychologists have taken up the invitation of 
Rennie et al., and of Henwood and Pidgeon, grounded theory has become a popular 
approach to qualitative research. 

One example of the method in clinical psychology is Bolger’s (1999) study of the 
phenomenon of emotional pain. The participants were women in a therapy group for 
adult children of alcoholics; they were interviewed on several occasions following 
group therapy sessions in which they had explored painful life experiences. The inter¬ 
views focused on how emotional pain was experienced and what was significant in that 
experience for them. The core category that emerged from the analysis was labeled 
the “broken self,” characterized by the four subcategories of woundedness, discon¬ 
nection, loss of self, and awareness of self. 

Another, well-known, example of a grounded theory study, in a more popularized 
book format, is Charmaz’s (1991) analysis of the experience of living with chronic 
illness (see box). 


Grounded theory example: Charmaz (1991) 

The sociologist Kathy Charmaz conducted in-depth qualitative interviews 
with people who had a chronic illness. The results, written up in her book 
Good Days, Bad Days (1991), give compelling accounts of the impact of chronic 
illness on people’s lives. In accordance with the grounded theory approach, she 
also used the data to construct a theory of how the person’s experience of time 
changes, and how this impacts on their sense of self. 


Consensual Qualitative Research 

Consensual Qualitative Research (Hill, 2011; Hill, Thompson, & Williams, 1997) 
is a systematic approach drawing on grounded theory and phenomenology and is 
currently widely used in North America. It features the use of multiple qualitative 
analysts and auditors, as well a system for interpreting the nature of qualitative themes 
based on their frequency in the data; for example, themes reported by all or almost all 
informants are defined as “general themes” that may define the experience being 
studied, while “typical themes” (reported by at least half of the informants) are useful 
for constructing a narrative of a typical experience. 

Empirical Phenomenology 

Empirical phenomenology is another early form of thematic analysis and the oldest 
systematic qualitative research method to emerge in psychology. It is an application 
of the phenomenological method described earlier in this chapter and was developed 
at Duquesne University (Pittsburgh, USA) in the 1970s (Giorgi & Giorgi, 2003). 
Much of the work has been published in the Journal of Phenomenological Psychology 
and the Duquesne Studies of Phenomenological Psychology. Giorgi, Wertz, and Fischer 
are three of the better known proponents. The approach stresses in-depth analysis, 
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often at first of single cases, aiming to describe the main defining features of an expe¬ 
rience (e.g., that of being criminally victimized), and the different variations that the 
experience may have in the population (analogous to the statistical ideas of mean and 
standard deviation). 

Hermeneutic Approaches 

A somewhat more flexible approach to qualitative research is represented by approaches 
that bill themselves as hermeneutic or phenomenological-hermeneutic (e.g., Packer & 
Addison, 1989). Researchers who describe themselves in this way find other phenom¬ 
enological approaches too restrictive and use a wider range of methods. They argue 
that it is important to go beyond the surface meaning of research protocols, in order 
to identify the implicit or even unconscious meanings embedded in texts. 

An example is a study by Walsh, Perrucci, and Severns (1999), which used a herme¬ 
neutic approach to explore “good moments” within a videotaped psychotherapy 
session. They identified the differing values of professionals and students at various 
stages of training about what constituted good psychotherapy. 

Interpretative Phenomenological Analysis 

Interpretative Phenomenological Analysis (IPA: Smith et al., 2009) is, as its name 
suggests, an explicitly phenomenological approach to thematic analysis. Many of the 
other thematic analysis approaches can also be used from a phenomenological stand¬ 
point, but it is hard-wired into the IPA method. IPA often appeals to newcomers to 
qualitative research, because it is an accessible, systematic, and practical approach to 
collecting and analyzing phenomenological data. It articulates the steps involved in 
conducting an investigation, for example, how to generate meaningful lower order 
and higher order categories from the data. Smith et al. (2009) set out the basis of the 
method, illustrating its steps using examples of data drawn from clinical and health 
psychology. 

Generic Thematic Analysis 

Finally, there are some generic approaches to thematic analysis that do not have a 
“brand name” attached to them, for example those described by Boyatzis (1998) and 
by Braun and Clarke (2006, 2013). Braun and Clarke’s (2006) paper is a particularly 
clear exposition of the thematic analysis method, which the authors intend as an 
“accessible and theoretically flexible approach” (p. 77). It is capable of use within a 
number of epistemological stances, including the phenomenological and social con¬ 
structionist ones. 


Narrative Approaches 

Many writers have proposed that narrative is fundamental to human communication 
and experience (e.g., Bruner, 1991, McLeod, 1997; Murray, 2003; Polkinghorne, 
1988; Sarbin, 1986). Indeed, the first sentence of Chapter 1 of this book starts us off 
on a narrative journey. Stories surround us, and it is therefore important that research 
methods be developed to help make sense of them. 
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Narrative Analysis 

There are a number of different versions of narrative analysis, both qualitative and 
quantitative (Avdi & Georgaca, 2007; Gonsalves & Stiles, 2011; Riessman, 2008). 
Their common feature is that they take stories as the raw data for the research, and set 
out to analyze the properties of stories in the context in which they occur. Various 
kinds of culturally defined narratives may be of interest, such as narratives of illness 
and recovery, victimization, identity, and faith journey. 

Several different formulations of narrative structure have been proposed (for a 
review, see McLeod, 1997), but the most basic consists of three elements. First, there 
is a beginning, in which the setting is described (e.g., “When I was 18, still living at 
home ...”), the main character is introduced (“I had a friend Angel who I used to 
visit ...”), and a situation or problem is introduced (“... and Angel got cancer”). 
Second, a series of actions, obstacles, conflicts, reactions, and attempted solutions is 
described, often leading to a climax or turning point. Third, there is an ending or 
resolution to the story, often with some attempt to state the point or the person’s 
current perspective (“Anyway, I still think about her; 29 is too young to die!”). 

A good example of a qualitative approach to narrative analysis is Humphreys’s 
(2000) analysis of the stories told in Alcoholics Anonymous (AA) group meetings. 
AA meetings are peer-led mutual support group meetings in which members’ narra¬ 
tives play a central role. Humphreys used qualitative observation of the meetings (see 
Chapter 7) in order to identify the prototypical stories. He arrived at a set of five 
types - which he labeled drunk-a-logs, serial stories, apologues, legends, and 
humorous stories - identifying the characteristic properties of each and their function 
in the context of the AA meetings. 

Life History Research 

One important type of story, possibly the most important type of story, is the story 
we tell about our own, or other people’s, lives: the biography, autobiography, or life 
history. Curiously, psychology has barely taken on board the study of whole lives. 
Almost all psychological research looks at small slices of behavior, and rarely places 
them into the context of the whole person and how they have lived their life over 
time. (One major exception to this is the extended case history, especially those writ¬ 
ten by the masters of the genre, starting with Freud’s early work at the beginning of 
the last century.) 

Formal life history research aims to build an understanding of individual lives over 
time. One moving example is Bogdan and Taylor’s (1976) study of an articulate 
“mentally retarded” man (“person with intellectual disabilities” in the UK), “Ed 
Murphy”, who describes his life in state institutions. This account both demonstrates 
the trials of living as a marginalized, institutionalized person, and also documents the 
existence of perceptive awareness in people whose voices are rarely listened to. 


Language-Based Approaches 

Language-based approaches focus on text (a general term used to encompass all kinds 
of verbal communication, such as interviews, letters, diaries, official documents). 
They tend to draw on social constructionist ideas - they are more interested in the 
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character of the language usage and what it accomplishes, and tend to be less inter¬ 
ested in the inner world of the person who produced the text. The most popular 
approach with psychologists is discourse analysis; other approaches include 
conversation analysis, critical and feminist approaches, and deconstructionism. 

Discourse Analysis 

There are many kinds of discourse analysis (Potter, 2012; Wetherell, Taylor, & Yates, 
2001); it is an interdisciplinary field spanning psychology, sociology, communication 
science, linguistics, and literature. Within psychology, the most popular approach is 
that articulated by Potter and Wetherell (1987). In general, discourse analysis involves 
rigorously examining texts in order to analyze the repertoires of discourse that a 
speaker is drawing upon, and the kinds of “subject positions” that the speaker is 
adopting (see box for an example). 


Example of discourse analysis: 

Madill and Barkham (1997) examined the transcripts of a single case of time- 
limited psychodynamic therapy. They showed how, during the course of the 
therapy, the client took on three different subject positions—which they labeled 
as the dutiful daughter, the bad mother, and the damaged child—and the dis¬ 
courses that she drew upon which exemplify each of these. For instance, they 
argued that the dutiful daughter position “draws on 18th and 19th century 
discourses of female subjectivity. During this period, subject positions were 
provided for women based primarily upon their domesticity...” (p. 242). Thus 
they were able to analyze the client’s talk within the context of its historical and 
social antecedents. 


So-called critical approaches (Hepburn, 2003) often use forms of discourse analysis 
to examine how language perpetuates power differentials, for example in terms of 
social class, race and ethnicity, gender, or sexual orientation. These approaches include 
some branches of feminist research, neo-Marxist approaches, and emancipatory 
approaches, such as Freirian and Foucauldian research (Hepburn, 2003; Father, 
1991; Sprague, 2005). For example, critical feminist approaches to alcohol use have 
looked at the gendered nature of discourses about female versus male alcohol con¬ 
sumption (e.g., Fyons & Willott, 2008). 

Conversation Analysis 

Although sometimes grouped with discourse analysis, conversation analysis has its 
own tradition and its own particular methods. An outgrowth of the work of sociolo¬ 
gists Garfinkel and Goffman, it was developed by Harvey Sacks (1995) as a rigorous 
method for identifying the common conversational sequences and strategies used by 
people to carry out everyday tasks. Conversation analysis attempts to study how 
speakers perceive each other’s utterances, based on how they respond to each other. 
Although it attempts to develop general models of the strategies people use to 
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accomplish practical work in conversation, it emphasizes the ad hoc, contextually 
embedded nature of “tallc-in-interaction” (Schegloff, 1999). Over the past 40 years, 
conversation analysis has built up a large repertoire of provisional understandings of 
everyday and professional speech, including many investigations of psychotherapy 
(Peralcyla, Antaki, Vehvilainen, & Leudar, 2008; ten Have, 2007). 

One interesting clinical application of conversation analysis is McCabe, Heath, 
Burns, and Priebe’s (2002) study of the interaction between psychiatrists and patients 
with psychosis. They found that although the patients made active attempts to engage 
their psychiatrists in discussion about the content of their beliefs, the psychiatrists 
seemed uncomfortable and tended to disengage or otherwise avoid the topic. 

De constructionism 

Finally, deconstructionist researchers engage in self-critique, embracing a post¬ 
modern view of the research process. They see the major task of researchers as being 
“deconstruction” of the cultural, social, or epistemological assumptions of their work 
and that of others. They embrace radical pluralism, and attempt to speak or give air 
to multiple voices while eschewing any attempt to bring these voices together into a 
single message. In essence, they attempt to mirror fragmented, postmodern, multi¬ 
cultural society in their research. For example, a deconstructionist researcher such as 
Lather (1991) might present her findings as a kind of research collage. 

Perhaps most importantly, deconstruction is an essential component of the process 
of evaluating research, in which one attempts to identify the implicit assumptions that 
drive a research study. Slife and Williams (1995) provide an excellent introduction to 
this approach. In our view, deconstructionism is less useful as a primary research 
method than as a method for reflecting on and critiquing research. This issue will be 
taken up in the last section of this chapter. 

Ethnographic Approaches 

As discussed at the beginning of this chapter, ethnography was the earliest form of 
systematic qualitative research. Initially developed within anthropology, and later 
within sociology, its principal applications continue to be within those fields. However, 
some psychologists have conducted ethnographic work, and it is an approach that has 
potential for use within the field (Suzuki, Ahulwalia, Mattis, & Quizon, 2005), 
although it is not an easy one to learn or carry out - ethnography is an art that 
requires many dedicated hours of the researcher’s time. 

The essence of ethnography is to spend time in the field setting, conducting par¬ 
ticipant observation (see Chapter 7) and talking with the people there. As soon as 
possible after each session, the researcher writes extensive field notes. From the mass 
of field notes accumulated in the study, the underlying social rules and norms that 
govern interaction in that culture are distilled (Emerson, 2001). 

A recent variant is focused ethnography, also known as applied ethnography 
(Knoblauch 2005; Savage, 2006; Simonds, Camic, & Causey, 2012). This is a smaller 
scale enterprise than general ethnography, which makes it more amenable for clinical 
psychologists to use. It sets out to answer a specific research question, say about the 
functioning of a particular service, and observations and interviews are focused around 
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that question. For example, Alcock, Camic, Barker, Haridi, & Raven (2011) used this 
method to examine the impact of a community-based intergenerational project, that 
is, a project that involved bringing together older people and younger people on a 
deprived London housing project. The ethnography documented the outcomes that 
occurred in both groups of participants, such as a reduction in age group stereotyping 
and an enhanced sense of community. 


WAYS OF EVALUATING QUALITATIVE STUDIES 

As must be obvious from the above discussion, the traditional psychometric criteria of 
reliability and validity do not easily carry over to qualitative approaches. The concepts 
of face and content validity can be used without much stretching, and a case can be 
argued for adapting some of the other concepts. However, it appears that a more 
fruitful approach is to articulate specific criteria for evaluating qualitative studies. 
Several scholars have attempted to do this (e.g., Elliott, Fischer, & Rennie, 1999; 
Mays & Pope, 2000; Morrow, 2005; Stiles, 1993, 1999; Yardley, 2000), although 
some doubts have also been expressed about the usefulness of specifying criteria for 
qualitative research (Barbour, 2001; Reicher, 2000). 

We are partial to the Elliott et al. (1999) version, not only because Elliott is a 
co-author of this text, but also because their guidelines, although having a broad 
applicability, were mainly developed and published within a clinical psychology con¬ 
text. Elliott et al. were attempting to help journal reviewers and editors evaluate 
qualitative studies that have been submitted for publication, but their framework is 
relevant to any readers of qualitative studies, as well as to researchers themselves. They 
describe some common guidelines shared by both quantitative and qualitative 
approaches, for example respect for participants and use of appropriate methods, and 
then guidelines specific to qualitative approaches (see Table 5.1). They describe each 
one and then give examples of good and bad practice under each. In summary, their 
guidelines for qualitative studies are: 

Owning one’s perspective. The authors describe their theoretical orientations and 
biases, in order to help readers evaluate the researchers’ interpretation of the data. For 
example, they would state if they were coming to the research from a psychoanalytic, 
or from a feminist, perspective. 

Situating the sample. The authors describe the research participants so that readers 
can judge how widely the findings might apply. 

Grounding in examples. The authors provide enough examples of their raw data to 
illustrate the analytic procedures used and to allow the reader to evaluate their find¬ 
ings. They also stay close to the data; any speculations that exceed the data are clearly 
labeled as such. 

Providing credibility checks. The researchers use methods for checking the credibility 
of the results, for example, analytic auditing (e.g., using multiple researchers or an 
additional person who checks the results against the data), triangulation (examining 
the phenomenon from multiple, varied perspectives) and testimonial validity (checking 
the results with the original informants or similar others). 
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Table 5.1 

Summary of Elliott et al.’s (1999) evolving guidelines 

A. 

Publish ability guidelines shared by both qualitative 
and quantitative approaches 

1 . 

Explicit scientific context and purpose 

2. 

Appropriate methods 

3. 

Respect for participants 

4. 

Specification of methods 

5. 

Appropriate discussion 

6. 

Clarity of presentation 

7. 

Contribution to knowledge 

B. 

Publish ability guidelines especially pertinent to 
qualitative research 

1 . 

Owning one’s perspective 

2. 

Situating the sample 

3. 

Grounding in examples 

4. 

Providing credibility checks 

5. 

Coherence 

6. 

Accomplishing general versus specific research tasks 

7. 

Resonating with readers 


Note. This table has been reproduced with permission from the British 
Journal of Clinical Psychology © The British Psychological Society. 


Coherence. The interpretation of the data is coherent and integrated, but at the 
same time it does not oversimplify the data. 

Accomplishing general versus specific research tasks. If the research aims to achieve a 
general understanding, then the appropriate range of people or situations is sampled. 
If it aims to achieve a specific understanding of a particular case, that case is described 
thoroughly enough for the reader to gain a full understanding. 

Resonating with the reader. From the point of view of the reader, the results are not 
only believable but seem to capture or make sense of the phenomenon, enabling the 
reader to understand the phenomenon more fully. 


CONCLUSION: CHOOSING AND COMBINING METHODS 

Qualitative methods have now become much more fully accepted within psychology, 
and the heat seems to be going out of the old polarized quantitative versus qualitative 
debate. Researchers, and research methodologists, are now focusing their attention on 
when best to use either a quantitative or a qualitative approach, what is the appropriate 
qualitative method for any given research question, and how best to appraise qualitative 
studies. However, given that we have now described the fundamentals of both quanti¬ 
tative and qualitative approaches, it is worth briefly considering how researchers might 
decide between them, and how they might be combined in a study or research program. 

We espouse the notion of methodological pluralism-, that different research methods 
are appropriate for different types of research question (see Chapter 3). For example, 
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qualitative methods are good for descriptive questions within a discovery-oriented 
framework, for instance, when you are trying to learn about a phenomenon that has 
not been previously researched. Quantitative methods are good for delimited ques¬ 
tions of covariation and comparison, for instance, looking for relationships between 
variables and for investigating causality. 

On the other hand, all methods have weaknesses or limitations, so, if possible, it is 
better to use multiple methods of measuring important variables, an approach known 
as trian£fulation (Creswell, 2009). In other words, it is unwise to rely solely on one 
perspective, source or approach (Campbell & Fiske, 1959; Cronbach & Meehl, 1955; 
Patton, 2002), because all of these have their limitations. For example, in psychotherapy 
outcome research it is useful to assess client change from the perspective of the client, 
the therapist, and a clinical interviewer. Moreover, a qualitative study focusing on how 
change occurred would complement the quantitative data. Several recent papers have 
called for randomized controlled trials (RCTs: see Chapter 8) to be augmented by 
qualitative studies using a subsample of the same RCT participants (e.g., Hill, Chui, & 
Baumann, 2013; Lewin, Glenton, & Oxman, 2009; Midgley, Ansaldo, & Target, 2014). 

Clinical psychology may be gradually entering a more pluralist phase for pragmatic 
reasons. A variety of publications have urged psychologists to adopt a qualitative 
approach to research (e.g., Camic, Rhodes, & Yardley, 2003; Smith, 2008; Willig, 
2013). However, the acid test - whether qualitative studies get published in presti¬ 
gious journals - still reveals a strong quantitative bias in the field (Rennie, 2012). 
There appears to be a residual attitude that qualitative methods are second class: the 
saying of Rutherford, the eminent physicist, that “qualitative is bad quantitative” 
(quoted in Stewart, 1989: 219) expresses this viewpoint succinctly. However, one 
sign that a pluralist attitude may be taking root is the interest among the newer gen¬ 
eration of researchers. Qualitative methods seem to appeal particularly to graduate 
students in clinical and counseling psychology, because they allow much closer contact 
with clinical phenomena. In the institutions we are familiar with, an increasing number 
of dissertations and theses now employ qualitative methods, perhaps so much so that 
there is a danger in some places that traditional quantitative skills are no longer being 
acquired. We believe that clinical psychologists should be competent in both 
quantitative and qualitative methods. 

There has been a recent upsurge of interest in pluralistic and mixed-method 
research, particularly research that combines both quantitative and qualitative methods 
(see, e.g., Barker & Pistrang, 2005; Creswell, 2009; Tashaklcori & Teddlie, 2009). 
Quantitative and qualitative approaches can often complement each other, in a single 
study or within a larger program of research. The different components can occur in 
various ways, such as: 

1. Beginning research in a new area with qualitative studies, either pilot research or 
more elaborate qualitative investigations. 

2. Building quantitative studies on earlier qualitative research. 

3. Using qualitative methods such as interviews and focus groups to develop 
quantitative measures. 

4. Using qualitative data to elucidate or explore quantitative findings, either as an 
adjunct to a primarily quantitative study or as a follow-up investigation. 
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5. Using quantitative data to elucidate qualitative findings, that is, the reverse of 
point 4, often found in sociology articles. 

6. Developing mixed-methods approaches that combine both kinds of data in a 
complementary fashion in the same study (e.g., case studies by Elliott et al., 2009; 
Parry, Shapiro, & Firth, 1986). 

7. Carrying out separate qualitative and quantitative studies of the same participants, 
either to address different questions, or to address the same question from differ¬ 
ent angles (e.g., Klein & Elliott, 2006; Madill & Barlcham, 1997; Patton, 2002). 

As we hope to have made clear, choosing the approach, or combining approaches, 
depends largely on the question you are trying to answer. The next two chapters 
examine practical issues in selecting and constructing measures, covering the two 
major approaches to psychological measurement: self-report and observation, looking 
at each from both qualitative and quantitative points of view. 

We hope that this chapter has given readers a taste of the range of available qualitative 
methods, and an understanding of their underlying philosophies, particularly the dis¬ 
tinction between phenomenological and constructionist methods. This chapter has 
been mosdy theoretically oriented; Chapter 12 will look in more detail at practical 
issues in analyzing qualitative data. 


CHAPTER SUMMARY 

Qualitative approaches use language as their raw material, in order to examine the 
participants’ thoughts, feelings, behavior, or linguistic strategies. Their main advantage 
is that they allow a rich description of the phenomena in depth and detail, sometimes 
called “thick description” (Geertz, 1973). There are two broad philosophical traditions 
underlying qualitative research: phenomenology and constructionism. Phenomenologists 
attempt to understand the person’s perceptions and experiences, their inner world, 
whereas constructionists focus on how language is used in social interactions. The main 
approaches to qualitative research can be grouped into four families: thematic analysis, 
narrative analysis, language-based approaches, and ethnography. The criteria of reli¬ 
ability and validity do not translate easily to qualitative research, but it is nevertheless 
possible to specify criteria for how qualitative research studies can be evaluated. At the 
same time, there appears to be increasing scope for combining qualitative and 
quantitative methods, both across and within studies. 


FURTHER READING 

Many treatments of qualitative methods have recently been published. Camic et al. 
(2003), Smith (2008), and Willig (2013) give accessible accounts of the theory and 
practice of the commonly used approaches. McLeod (2011) has a clear and thorough 
account of the application of qualitative methods to psychological therapy and 
counseling research. Pistrang and Barker (2012) explain how to choose a particular 
qualitative approach, using a running example to illustrate when each is suitable. 
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Braun and Clarke (2005) have a good description of general thematic analysis, pos¬ 
sibly the mostly commonly used approach. Bruner (1991) on narrative is erudite and 
thought-provoking. 

Some of the old stalwarts, such as Lincoln and Guba (1985) and Patton (2002), 
still hold up, and Taylor and Bogdan’s (1998) sociologically oriented text includes 
some illustrative studies, although their current edition omits the single-case account 
of “Ed Murphy” (originally published in Bogdan & Taylor, 1976), discussed above. 
For more extensive but accessible treatments of the quantitative versus qualitative 
debate, see Bryman (1988), Polkinghorne (1983), and Proctor and Capaldi (2006). 
Since many qualitative approaches have their roots in literary theory, it is also worth 
reading about them in that context. Eagleton (2008) gives an excellent exposition 
and critique of, among other things, phenomenology, hermeneutics, and poststruc¬ 
turalism as applied to the analysis of literary texts. 


QUESTIONS FOR REFLECTION 

1. Which philosophical position do you prefer: phenomenology or social construc¬ 
tionism? Neither? Why? 

2. Do you think it is ever possible to set aside one’s assumptions? Is this what 
“bracketing” refers to? 

3. There is a huge variety of labels for different types of qualitative research. How 
similar or different do you think the various approaches are? 

4. It could be argued that guidelines for qualitative research are features of good 
research practice in general. What do you think? 
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KEY POINTS IN THIS CHAPTER 

• Self-report methods, such as interviews and questionnaires, ask the person 
for information directly. 

• Their advantage is that they give you the person’s own perspective; their 
disadvantage is that there are potential validity problems (e.g., people may 
deceive themselves or others). 

• The main qualitative self-report approach is the semi-structured interview. 

• Qualitative interviewing is a distinct skill, related to but different from 
clinical interviewing. 

• The main quantitative self-report approach is the written questionnaire, but 
structured interviews and internet surveys are also used. 

• There are several principles to follow in constructing quantitative self-report 
instruments. 

• Response sets, such as acquiescence and social desirability, refer to tendencies 
to respond to items independently of their content. They need to be taken 
into account when designing and interpreting self-report measures. 


When you want to know something about a person, the most natural thing is to ask. 
Research methods that take the approach of asking the person directly are known as 
self-report methods , and mainly take the form of interviews, questionnaires, and rating 
scales. They are the most commonly used type of measure in the social sciences in 
general and in clinical psychology in particular. 


Research Methods in Clinical Psychology: An Introduction for Students and Practitioners , 
Third Edition. Chris Barker, Nancy Pistrang, and Robert Elliott. 

© 2016 John Wiley & Sons, Ltd. Published 2016 by John Wiley & Sons, Ltd. 
Companion Website: www.wiley.com/go/barker 
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For example, suppose that you have set up a new counseling service for adolescents and 
want to evaluate its effectiveness. You ask the service users to rate the severity of their prob¬ 
lems before and after counseling and how satisfied they are, using standardized instruments. 
You also use a semi-structured interview to assess the adolescents’ overall experience of the 
service, including what they feel they has changed, what they found helpful, and any specific 
criticisms they had of it. Client satisfaction studies and clinical effectiveness studies like this 
have become important, with the increased emphasis on accountability to service users. 

Instead of asking the person direcdy, you may instead, or in addition, ask someone 
who knows the person, such as a friend, family member, or therapist (Connelly & Ones, 
2010). This is often called using an informant (a term which has unfortunate connota¬ 
tions of sneakiness). It allows you to get the views of someone who knows the person 
well and who has greater opportunity than you to observe him or her in a natural 
setting. It is also useful when the respondent cannot give you reliable information. For 
example, in research with children, it is often useful to have the parents’ and the teach¬ 
er’s views of the relevant behavior. This is why, as we discussed in the previous chapter, 
a more accurate term would be “verbal-report” rather than “self-report.” However, the 
term “self-report” is commonly used to cover reports from both the person of interest 
and from other respondents, and we will retain that usage here. 

Advantages and Disadvantages 

The great advantage of self-report is that it gives you the respondents’ own views 
direcdy. It gives access to phenomenological data, that is, respondents’ perceptions of 
themselves and their world, which are unobtainable in any other way. As Kvale and 
Brinkman (2009) put it, “If you want to know how people understand their world and 
their lives, why not talk with them?” (p. xvii). Furthermore, self-report methods can be 
used to obtain information in situations where observational data are not normally 
available, for instance, for studying life histories, or behavior during a major disaster. 

The main disadvantage of self-report is that there are a number of potential validity 
problems associated with it. The data are personal and idiosyncratic and thus may bear 
litde relationship to “reality,” as seen by you or others. Moreover, people are not 
always truthful. They may deceive themselves, such as when an alcoholic cannot admit 
his dependency to himself, or they may deceive the researcher, such as when a young 
offender does not want to reveal his socially undesirable thoughts or behavior. 
Furthermore, research participants may not be able to provide the level of detail, or 
use the concepts, that the researcher is interested in. This is especially the case when 
people have limited expressive language, for instance, young children or people with 
neurological conditions such as dementia or aphasia. 

Arguments arising in social psychology, psychoanalysis, and cognitive psychology 
cast doubt upon the validity of self-reports. From the social psychological perspec¬ 
tive of attribution theory, Nisbett and his colleagues (e.g., Nisbett & Ross, 1980; 
Nisbett & Wilson, 1977) have argued that people often do not know what influ¬ 
ences their behavior, and that there are pervasive biases in the way that we account 
for our own and others’ behavior. One common source of bias, known as the actor- 
observer effect , is the tendency for people to say that their own behavior is caused 
by situational factors and that other people’s behavior is caused by dispositional 
factors (Fislce & Taylor, 2013; Jones & Nisbett, 1971). For example, a student 
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might say that she failed an exam because she slept badly the night before, whereas 
she might say that her room-mate failed the exam because she was too lazy to study 
for it. Another related type of bias, known as self-serving bias, is the tendency to 
take credit for success and deny responsibility for failure (Fislce & Taylor, 2013). 

Psychoanalysts similarly emphasize the limits to the person’s conscious self-knowledge. 
They argue that many important feelings and experiences are unconscious, and prevented 
by defenses such as repression or denial from becoming conscious. Thus, a person’s 
accounts cannot be taken at face value. Some psychoanalytically oriented researchers 
prefer projective measures, principally the Thematic Apperception Test (TAT), the 
Rorschach inkblot test, and sentence completion methods, which are designed to assess 
the person’s unconscious thoughts and feelings, although the validity of these measures 
can also be hard to establish (Meyer, 2004; Westen, Feit, & Zittel,, 1999). 

Cognitive psychologists focus on the complex mental processes involved in 
responding to self-report questions, particularly quantitative ones. There is general 
agreement that responding involves four separate cognitive components: under¬ 
standing the question, recalling information from memory, integrating the 
information, and formulating a response (Tourangeau & Bradburn, 2010). Each of 
these steps is prone to error or bias. These problems particularly apply to retrospective 
self-report questions, where the respondent is required to recall past behaviors, 
thoughts, or feelings (e.g., “How many alcoholic drinks did you consume last 
week?”); such recall is prone to various limitations of memory retrieval (Piasecki, 
Hufford, Solhan, & Trull, 2007). 

These strictures about the limits of self-report methods are important to bear in 
mind. However, this does not mean that all self-report data are invalid, only that they 
cannot be trusted in all cases (Ericsson & Simon, 1993). All measurement methods 
have their drawbacks, and the potential limitations of the data must be considered at 
the analysis and interpretation stage. Thus, we should not abandon this method of data 
collection, although it is often advisable to supplement self-report data with observa¬ 
tional data (or at least self-report data from other perspectives). In addition, it is a good 
idea to be sensitive to the possibilities for self-deception in verbal protocols (see 
Churchill, 2000, for an example of “seeing through” self-deceptive self-reports). 

Constructing an interview or questionnaire may appear to be straightforward, but the 
apparent simplicity is deceptive. Most people have been on the receiving end of an 
irritating, poorly designed questionnaire or interview, often in the context of market 
research. Designing good self-report measures is an art and a craft. For this reason, it is 
preferable, where possible, to use well-designed established measures rather than 
attempting to design your own from scratch. There is a huge literature on research inter¬ 
views and questionnaires, including many books (e.g., Bradburn, Sudman, & Wansinlc, 
2004; Dillman, Smyth, & Christian, 2009; Josselson, 2013; Kvale & Brinkman, 2009). 

Terminology 

An interview is a special type of conversation aimed at gathering information, although 
the interviewer usually has a written guide, known as an interview protocol or interview 
schedule. (Note that the interview protocol is not the same thing as the research 
protocol, which refers to the plan for the study as a whole, including, for example, the 
research design and the sampling procedure.) Interviews are usually conducted face to face, 
although they may be done over the telephone or over the internet. 



Self-Report Methods 


99 


A questionnaire, on the other hand, refers to a structured series of written questions, 
which usually generate written responses. Checklists and inventories (the terms are used 
almost interchangeably) are a type of questionnaire that presents a list of items in a 
similar format and asks respondents to rate all that apply to them. Two widely used 
examples of inventories are the Generalized Anxiety Disorder scale (GAD-7: Kroenke., 
Spitzer, Williams, Monahan, & Lowe, 2007)—a 7-item scale assessing anxiety—and 
the CORE Outcome Measure (CORE-OM: Evans et al., 2002), a 34-item inventory 
measuring the number and frequency of psychological symptoms. Questionnaires may 
be composed of several subscales, each of which measures an internally consistent 
construct (such as the Well-being, Problems, Functioning, and Risk subscales of the 
CORE-OM), although the subscales may often overlap with each other. 

The term survey is widely used but imprecisely defined. It usually denotes a 
systematic study of a medium to large sample done either by interview or by postal 
(“mail-out”) or internet questionnaire. A census means a survey of the whole 
population (as opposed to a sample from that population: see Chapter 10); the best 
known example is the government population census. 


Mode of Administration 

Since self-report data may be gathered either by written questionnaires or by interview, 
researchers need to consider which mode of administration would better suit their 
purposes. The advantages of written questionnaires are that: 

• they are standardized (i.e., the wording is exacdy the same each time); 

• they allow respondents to fill them out privately, in their own time; 

• they can be used to ensure confidentiality, via a code numbering system, and so 
they can potentially cover embarrassing, socially undesirable, or illegal topics (e.g., 
sexual behavior or drug use); and 

• they are cheaper to administer. 

The advantages of interviews are that they can use the rapport and flexibility of the 
relationship between the interviewer and the respondent to enable the interviewer to: 

• ask follow-up questions, in order to clarify the respondent’s meaning, probe for 
material that the respondent does not mention spontaneously, and get beyond 
superficial responses; 

• ensure that the respondent answers all the questions; 

• give more complicated instructions and check that they are understood; 

• vary the order of the questions; 

• allow the respondents to ask their own questions of the interviewer; and 

• allow researchers to gather enough information to make judgments about the 
validity of the respondent’s self-report. 

Interviews are additionally appealing to clinical psychologists because their clinical 
skills can be used. However, clinicians also have some unlearning to do, as conducting 
a research interview is quite different from conducting a therapeutic or assessment 
interview (we will elaborate on this point later in this chapter). 
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Open-ended and Closed-ended Questions 

Self-report methods can yield either qualitative or quantitative data, depending largely 
on whether open-ended or closed-ended questions are used. 

Open-ended questions are those that do not restrict the answer, which is usually 
recorded verbatim. For example, the question “How are you feeling right now?” 
might yield the responses “Fine, thanks,” “Like death warmed up” or “Better than 
yesterday, at least.” However, content analysis may be used at a later stage to classify 
the responses (e.g., into positive, negative, or neutral). Also, some open-ended 
questions may yield quantitative data (e.g., “How old are you?”). 

The advantages of open-ended questions are that they enable the researcher to study 
complex experiences: respondents are able to qualify or explain their answers and also 
have the opportunity to express ambivalent or contradictory feelings. Furthermore, 
their initial responses are potentially less influenced by the researcher’s framework. 
Respondents are free to answer as they wish, using their own spontaneous language. 

The main disadvantage of open-ended questions, from the researcher’s point of 
view, is that it is more difficult to evaluate the reliability and validity of verbal data. 
It is hard to ascertain the extent of such potential problems as interviewer bias and 
variability, and respondent deception, exaggeration, fabrication, and forgetting. It is 
not that the reliability and validity of qualitative self-report measures are inherently 
worse, they are just harder to evaluate, so that both the researchers and the readers 
are more likely to feel on shaky ground. (On the other hand, careful examination of 
the respondent’s manner and word choice can provide important hints about the 
credibility of verbal data.) 

A second issue is that open-ended questions typically generate large amounts 
of data (the “data overload” problem; Miles,Huberman, & Saldana, 2013), 
which are usually time-consuming to analyze. For a start, most qualitative inter¬ 
views need to be transcribed, which often takes considerable effort (this is where 
having sufficient funding to pay for transcription can save the researcher time and 
frustration). Furthermore, the analysis itself requires effort and skill. This will be 
considered further in Chapter 12, where we cover the analysis and interpretation 
of qualitative data. 

A final issue is that open-ended questions tend to produce a great variability in the 
amount of data across respondents. Verbally fluent respondents may provide very full 
answers, while less fluent respondents may find open-ended questions demanding to 
answer, and give very terse responses. In particular, open-ended questions in written 
questionnaires are often left blank, because they require more effort to complete. 

Closed-ended questions constrain the answer in some way. Answers are usually 
recorded in an abbreviated form using a numerical code. For instance, the possible 
responses to the closed question “Are you feeling happy, sad, or neither, at the 
moment?” might be coded as 1 = “Happy,” 2 = “Sad,” and 3 = “Neither/Don’t 
know.” Responses can be made in the form of a dichotomous choice (i.e., when there 
are two possible responses, such as Yes/No), a multiple choice (i.e., where the respon¬ 
dent has to choose one response from several possibilities), a rank ordering (i.e., 
where a number of alternatives have to be put in order of preference or strength of 
opinion), or ticking one or more applicable items on a checklist. 
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The advantages of closed-ended questions are that the responses are easier to analyze, 
quantify, and compare across respondents. They also help to prompt respondents about 
the possible range of responses. 

The major disadvantages of closed-ended questions are succinctly summarized by 
Sheatsley (1983, p. 197): “People understand the questions differently; respondents 
are forced into what may seem to them an unnatural reply; they have no opportunity 
to qualify their answers or to explain their opinions more precisely.” For example, in 
research on stressful life events, information from a checklist measure simply tells you 
whether an event has occurred, but you have no information about the meaning of the 
event for the individual. “The death of a pet” might mean that the goldfish passed 
away, or that an elderly person’s sole companion has died. A semi-structured life events 
interview (e.g., that of Brown & Harris, 1978) allows the interviewer to probe further 
in order to establish the meaning and significance of each reported event. Furthermore, 
interview or questionnaire studies that consist entirely of closed questions can be an 
annoying experience for respondents, as they may feel that they are not getting a 
chance to put their views across, and may resent being controlled by the format. 

The following sections examine qualitative and quantitative methods in turn. This 
structure is mainly for didactic purposes: we do not wish to artificially polarize the two 
types of method. In practice, there is a continuum, ranging from unstructured, open- 
ended methods, through semi-structured interviews or questionnaires, to structured 
quantitative methods. As we will state repeatedly, it is possible, and often desirable, to 
combine both qualitative and quantitative procedures within the same study. 


QUALITATIVE SELF REPORT METHODS 

For illustrative purposes, we will discuss qualitative self-report methods mostly in the 
context of the qualitative interview, since the interview is the most frequently used 
method within the qualitative tradition. However, there are various other qualitative 
self-report methods, such as: (1) open-ended questionnaires, for example, the Helpful 
Aspects of Therapy form (Llewelyn, 1988); (2) personal documents approaches, which 
use pre-existing written records, such as personal journals (Taylor & Bogdan, 1998); 
and (3) structured qualitative questionnaires, for example, the repertory grid (Kelly, 
1955), although repertory grids are often analyzed quantitatively (see Winter, 2003). 


• The semi-structured qualitative interview is the most common qualitative 
self-report method. 

• It is usually based on an interview schedule, which lists the major questions 
to be asked and some possible probes to follow up with. 

• The interview style is mostly based on open-ended questions, but can use 
other active listening responses, such as reflections. 

• The interviewer should have an interested stance with a kind of free-floating 
attention, and guard against putting words into the respondent’s mouth. 
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Types of Qualitative Interview 

In addition to using open-ended questions, qualitative interviews are usually loosely 
structured, and aim to get an in-depth account of the topic (Josselson, 2013; Kvale & 
Brinkman, 2009; Rubin & Rubin, 2005). They have similarities to psychological 
assessment and to journalistic interviews (but also important differences, which we 
will discuss below). 

There are several different forms of qualitative interview (Patton, 2002). The most 
common is the semi-structured interview. Such interviews can vary in length from a 
few minutes to many hours and can take place on one occasion or across many 
occasions. Typically, qualitative interviews last around an hour (both parties tend to 
get tired by longer interviews). At the upper end, intensive life-story interviewing, 
described by Taylor and Bogdan (1998), may involve many interviews totaling up to 
50 or 120 hours of conversation. 

Alternatives to the semi-structured interview include: (1) the informal or unstructured 
conversational interview, which is most common as an element of participant obser¬ 
vation; (2) the standardized open-ended interview, which consists of a uniform set of 
questions that are always administered in the same order, often with fixed follow-up 
questions; and (3) the questionnaire-with-follow-up-interview method favored by 
phenomenological researchers of the Duquesne school (e.g., Giorgi & Giorgi, 2003; 
Wertz, 1983). In the last, open-ended questionnaires are used to identify promising 
or representative respondents who are then interviewed in detail. 

One other option is to conduct focusjjroup interviews (e.g., Kitzinger, 1995; Stewart, 
Shamdasani, & Rook, 2007). This method, which originated in market research and 
public opinion polling, involves assembling a small group of respondents. The interviewer 
interacts with the whole group, following the same kind of semi-structured protocol as 
in an individual interview. The group format has the advantage of enabling respon¬ 
dents to react to each other’s contributions, and thus possibly to explore the topic 
more deeply. The disadvantages are that the interview is subject to the usual group 
dynamics, such as conformity pressures, and giving more weight to the opinions of 
more vocal or prestigious members, which may affect its validity. 

A note on terminology: we will tend to use the terms “respondent” or “interviewee” 
to refer to the person on the receiving end of the interview. Other possibilities are 
“informant” or “participant.” We avoid the term “subject” because of its connotations 
of powerlessness (see Chapter 10). 

Likewise, there are a number of models of the relationship between the interviewer 
and interviewee. At one end of the range are the traditional “subject” models, in which 
the interviewee is seen as a passive information provider responding to the researcher’s 
questions. At the other end are the various “co-researcher” models, which try to min¬ 
imize the distinction between researcher and “subject” in order to create research in 
which participant and researcher interact as equals. Examples are feminist (Oakley, 
1981; Wilkinson, 1986), new-paradigm (Reason & Rowan, 1981), and participatory 
action (Jason, Keys, Suarez-Balcazar, Taylor, & Davis, 2004) research. 

Some feminist researchers (e.g., Belenky, Clinchy, Goldberger, & Tarule, 1986; 
Carlson, 1972; Riger, 1992; Wilkinson, 1986) see traditional paradigms, where the 
researcher is in charge of the relationship, as replicating patriarchal power relationships. 
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Their critique is mostly aimed at quantification and experimental manipulation, but 
it also extends to the more traditional forms of qualitative interviewing (Oakley, 1981). 
They argue that to empower women one must listen directly to what they are saying 
and respond personally without hiding behind the facade of the objective researcher. 
However, other researchers argue that it is mistaken to discard certain methods as being 
insufficiendy feminist (Hughes & Cohen, 2010; Peplau & Conrad, 1989). 

Interview Schedule 

The first step is to prepare an interview schedule that lists the important areas to be 
addressed, with some important questions to be asked. The schedule is informed by the 
research questions, the previous literature, and personal experience. It is usually a good 
idea to structure it around some sort of logical framework, which could be, for example, 
conceptual or chronological, but it is important to use the schedule flexibly, not rigidly. It 
is vital to pilot test the interview schedule on a few respondents and revise it accordingly. 

Young and Willmott (1957), in their classic study, Family and Kinship in East 
London , describe the use of their interview schedule: 

We used a schedule of questions, but the interviews were much more informal and less 
standardized than those in the general survey. Answers had to be obtained to all the set 
questions listed (though not necessarily in the same order), but this did not exhaust the 
interview. Each couple being in some way different from every other, we endeavored to 
find out as much as we could about the peculiarities of each couple’s experiences and 
family relationships, using the set questions as leads and following up anything of interest 
which emerged in the answers to them as the basis for yet further questions. (Young & 
Willmott, 1957, p. 207) 

The interview typically starts with general questions, as a warm-up, and then more 
detailed, or more sensitive, questions come later in the interview. The standard 
questions need not be covered in a fixed order, but the interview schedule serves as 
an aide-memoire, to remind you what needs to be asked. The schedule lists the main 
questions, but it is important to use follow-up questions in order to obtain more 
detail as necessary. An example of an interview schedule is given in the box below. 

Sample Interview Schedule 

For illustration, here is part of an interview schedule from a study of peer support for 
women with gynecological cancer (Pistrang, Jay, Gessler, & Barker, 2013): the 
Women Helping Women project. The peer supporters were former patients who had 
been successfully treated for gynecological cancer; they supported more recently 
diagnosed patients by telephone. The interview aimed to understand the impact on 
the peer supporters themselves—what they experienced as beneficial or detrimental— 
and to understand the challenges of delivering peer support. (A separate interview 
focused on the perspective of the woman receiving support.) The excerpt in the box 
includes three central sections of the interview schedule. The first section starts with 
broad questions, in order to get a sense of the respondent’s overall experience before 
moving on to more focused questions which address the main topic of the study. 
An extract from one of the interviews follows later in the chapter. 
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Excerpt of interview schedule for peer supporters of women 
with gynecological cancer (Women Helping Women project: 
Pistrang et al., 2013) 

Overall experience of being a peer supporter 

The aim of this introductory section is to get an overall picture of the woman’s 
experience of being a peer supporter, before asking more specifically about processes 
and impact. 

What was it like being a peer supporter? 

What were some of the best things about it for you? 

What were the things you didn’t like, or things that could have been better? 
Was it what you expected? 

Processes of support 

The aim here is to elicit a detailed picture of how peer support operated in practice, 
for example, the nature of the telephone conversations and how the peer supporter 
attempted to be helpful. 

What sort of things did you talk about with the woman you were paired with? 
What were your conversations like? 

In what ways did you try to help? 

What was it about your conversations that you think helped/ didn’t help her? 
What difficulties, or dilemmas, did you face in trying to be supportive? 

Impact of peer support on recipient and provider 

The aim here is to elicit the participant’s views about how peer support may have 
helped (or not helped) the woman she was supporting, as well as the impact (positive 
or negative) on the peer supporter herself. 

Do you think the woman you were supporting benefited in any way? 

Were there any ways in which you think it was unhelpful or caused problems 
for her? 

Were there any ways in which you benefited? 

Were there any ways in which it was unhelpful or caused problems tor youl 

Notes for interviewer 

Examples of follow-up questions: 

What was that like for you? 

How did that work? 

Can you give an example of that? 

What were you thinking at that point? 
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Interviewing Style 

The interviewer’s general stance should be one of empathic and nonjudgmental 
attention, giving the respondent plenty of space to think and talk, and avoiding leading 
questions. If you are unclear about anything, probe further, although legal-style 
interrogation is obviously to be avoided. 

In order to be an effective qualitative interviewer, you must start with an attitude 
of genuine interest in learning from others, in hearing their story, and you must be 
able to listen to them with tolerance and acceptance. The schizophrenia researcher, 
John Strauss, realized after 30 years of quantitative research that he had learned very 
little about the nature of schizophrenia; he felt that he had only really begun to learn 
when he started to listen to what the patients had to say when he asked them about 
their experiences (Strauss, Harding, Hafez, & Lieberman, 1987). 

Qualitative interviewing has increasingly been viewed as a key method for helping 
respondents “tell their stories.” It could be argued that the urge to tell stories is so 
strong that qualitative researchers proceed at their peril if they try to ignore the power 
of narrative. In narrative approaches (see Chapter 5), the interviewer’s main job is to 
help the respondent to tell their story, perhaps beginning with something like, “I 
wonder ifyou could tell me the story of [e.g., when the depression began] in as much 
detail as you feel comfortable giving me.” Then, the interviewer’s job is to encourage 
the respondent to keep going, or to back up and provide missing information if they 
skip over something important. (Narrative also has therapeutic functions, especially in 
the treatment of traumatic or other difficult life situations; for example, McLeod, 
1997.) However, the interviewer does need to balance the respondent’s need to tell 
their story, with their own need to have a clear focus on obtaining material relevant to 
the research questions. 

Your therapeutic skills, such as empathy and clinical intuition, come very much to the 
fore here. However, there must be a clear distinction between research and therapy (or 
clinical assessment) interviews, as all therapeutic orientations involve interventions 
which are inappropriate for qualitative interviewing. For instance, it would be wrong to 
conduct a qualitative interview in cognitive-behavioral style, as this approach, like most 
therapies, is ultimately aimed at changing the client’s thoughts and experiences rather 
than finding out about them. Even client-centered therapists may engage in too much 
paraphrasing, which can easily end up putting words in the respondent’s mouth or a loss 
of focus for the interview. It is also important not to assume too much common under¬ 
standing due to shared culture or experiences. The interviewer needs to take a partial 
step back from what the respondent is saying, taking a stance like the proverbial Martian, 
who knows nothing about how Earthlings conduct their affairs. In general, clinical 
assessment interviews are also quite different from research interviews, as the former 
tend to be aimed at assembling the information into a coherent clinical formulation. 

It is important, for two reasons, to audio-record the interview. First, retrospective 
notes or memory are prone to inaccuracies and incompleteness. Second, extensive 
note taking runs the risk of distracting the respondent and interrupting the flow of 
the interview. Notes may suffice for interviews which are brief and highly structured; 
in such situations, note taking may also be acceptable to the respondent. However, if 
you have to interview without a recorder, your notes need to clearly identify which 
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parts are the respondent’s verbatim statements and which are your own summary. 
Written notes can also be used as a reminder during the interview, for example, jotting 
down a particular phrase used by the respondent that you want to return to later on. 
In this case, note taking is brief and should be limited to those essential reminders 
needed to help you conduct the interview. Finally, as we suggested in Chapter 3, it is 
worth keeping a research journal to record your general impressions of each interview. 

Specific Qualitative Interviewing Skills 

If one is genuinely motivated to understand and learn about people by interviewing, 
then a number of technical skills in information gathering and listening become 
useful. One useful way to describe these skills is in terms of what are called “response 
modes” (Goodman & Dooley, 1976), that is, basic types of interviewer speech acts or 
responses. These can be divided into three groups: responses which are essential for 
qualitative interviewing; supplementary responses which are sometimes useful; and 
responses which should generally be avoided. 

Essential response modes. These lean heavily on the “active listening” responses such 
as those made famous by client-centered therapy. Thus, two key responses are open 
questions—to gather information and to encourage the respondent to elaborate— 
and reflections—to communicate understanding and to encourage further explora¬ 
tion of content. Questions to guide the discussion (“Could you tell me about ...”) 
are also essential for beginning and structuring the interview, while brief acknowledgments 
(e.g., “I see” or “Uh-huh”) build rapport and help the respondent keep talking. If a 
more active, paraphrasing style is used, you are more likely to need to account for the 
interviewer’s possible influence on the data when you do your analysis. 

As noted earlier, the interview schedule will typically start with some general 
questions in order to get a broad overview of the respondent’s experiences, followed 
by more focused questions addressing the topics of the study. However, it is often the 
unscripted follow-up questions that are important. These are hard to get right, even 
for experienced interviewers. The interviewer needs to decide in the moment what to 
follow up and what not to, given that there won’t be time to follow up on every point. 

Follow-up questions (sometimes called probes) are used to get a detailed, fine¬ 
grained description of material relevant to the research question, or in cases when the 
respondent’s answer to the initial question is unclear or ambiguous. The interviewer 
needs to listen carefully to what the respondent is saying; it’s useful to pay particular 
attention to the emotional valence of what is being said (not just the content), as this 
often signifies important material. (The use of metaphors is also worth listening out 
for: they often capture essential meanings.) On the other hand, material tangential to 
the research questions, even if it is personally meaningful to the respondent, usually 
does not need to be followed up. Some typical follow-up questions are listed at the end 
to the interview schedule above. They are normally brief and encourage elaboration: 
for example, “Could you say more about that?” or “What was that like for you?” 

Supplementary response modes. In addition, several other types of response can be 
useful, although they should not be overused. For example, closed questions can be 
used to test out ideas near the end of the interview. If you have a hunch about 
something, for example, an idea that has arisen from earlier interviews or background 
reading, you may wish to ask the respondent about this, but it is important to save this 
for late in the interview in order not to “lead the witness.” Other supplementary 
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response modes include self-disclosures, which allow the interviewer to explain his or 
her goals for the interview and to build rapport by answering questions about him- or 
herself; and reassurances or sympathizing responses (“It’s hard”), to encourage 
openness in the respondent. 

Responses to be avoided. These include problem-solving advisements, which give 
respondents suggestions about how to solve their problems; interpretations, which try to 
tell the respondent why they did something or what they actually felt; disagreements or con¬ 
frontations, which cut off communication by criticizing or putting the respondent down 
(e.g., do not try to “catch out” respondents in contradictions but instead try to express 
curiosity at the complexity of the person’s experiences); and giving respondents information 
(other than information about the structure and purpose of the interview itself). 

Tracking the Respondent’s Answers 

As Kvale and Brinkman (2009) caution, it is important for the researcher to track the 
relevance of the respondent’s answers during the interview, in order to make sure that 
the research questions are being answered and the meaning of the respondent’s state¬ 
ments is clear. Once the interview is transcribed and you sit down to analyze your 
data, it is generally too late to go back to your respondents in order to ask them to 
clarify what they meant. It is also important to scrutinize the data from your first 
interview before embarking on further interviews. This will make you aware of 
problems with any of the interview questions or with ambiguous or vague answers 
from respondents, so that you can modify your interview schedule and technique. 

Qualitative interviewers are sometimes confronted with apparentiy contradictory 
information from respondents. This should not necessarily be regarded as evidence of 
unreliability or invalidity. People will often have multiple, sometimes contradictory, 
feelings and views. It is a good idea to listen for such contradictions, because they may 
reflect ambivalent feelings or avoidance of painful experiences. During the interview, 
you may become aware of possible inconsistencies, which could be: (1) internal, between 
different parts of the story; (2) external, with another source such as a document or 
another respondent; or (3) between manifest and latent content, for example, between 
the words and the tone of voice. Rather than pouncing on them, it is a good idea to 
gently and tactfully inquire about them (“That’s interesting, it sounds like you have 
several different kinds of feelings about your clients. Can you tell me more about that? ”). 

Sample Interview 

The following excerpt comes from a semi-structured interview in the Women Helping 
Women project (Pistrang et al., 2013), which followed the interview schedule 
described above. The participant (R) is a former patient who became a peer supporter; 
the interviewer (I) was Nancy Pistrang, this book’s second author. The interview has 
been edited for readability, and also shortened for didactic purposes; ellipses (...) indi¬ 
cate where cuts have been made. 

As Young and Willmott (1957) point out, there is not an exact correspondence between 
the questions in the schedule and those asked by the interviewer: the schedule is a vehicle 
for enabling the respondent to talk about the important issues. The excerpt of the tran¬ 
script shown in the box focuses on the participant’s response to a single question in the 
interview schedule: “Were there any ways in which yon benefited [from giving peer 
support]?” and illustrates the interviewer’s follow-up questions and reflections. 
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Part of an interview transcript of a peer supporter of a woman 
with gynecological cancer (Women Helping Women project: 
Pistrang et al., 2013) 

I: We’ve talked quite a bit about how it [peer support] benefited [the patient]. 
And I’m just wondering whether you felt you benefited in any way from 
supporting her—how it was for you? 

R: I think it did a number of things for me, really. One, I think it allowed me 
to take a step back from my own situation. So, instead of being the patient, 
it allowed me to look at my own experience in a more objective way. It 
brought me out of my journey, if you see what I mean, to be able to look 
at it in a more objective way. It also nicely reminded me that really, it is a 
pretty tough thing to go through ... it just does remind you, actually, of 
how difficult it was and how far you’ve come ... So you kind of reflect on 
some of the stages you’ve been through ... [it] allowed me to put things in 
perspective. 

I: Right. It’s an interesting thing, that, really—that somehow through talking 

to someone else who’s earlier on in that journey, it somehow allows you to 
step out of it in a way, you’re saying, and look at it more objectively. What 
is it, do you think? What’s that process? What’s going on? 

R: It’s a weird kind of—almost like a parental thing, if that makes any sense. 

I: Can you say more about that? 

R: It’s almost like you get a chance to turn this really ugly, rotten, horrible 
thing that’s happened to you into something—a way of helping somebody 
else. And I think it then allows you to sort of flip your own feelings on the 
head and realize, “I’ve got something to share. I’ve got something to give 
from this that’s positive. I’ve got something that could perhaps impact 
somebody else’s life in a positive way through this—you know, through my 
own rotten experiences.” It kind of gives you a bit of a feeling, I guess, of 
pride ... that’s what I mean. In a certain parental way, you get to feel 
quite—almost protective over the person and —and encouraging—von 
become very encouraging. 

I: Yeah, and sort of nurturing another person. 

R: Yes. And in a way, I think it sort of drifts back to you. And you end up kind 
of realizing that you’ve probably not been doing quite enough nurturing 
of yourself. And again, that’s an important sort of thing. 




Self-Report Methods 


109 


This excerpt illustrates how the interviewer carries out her aim—to understand the 
participant’s experience—by using questions to clarify and explore, and reflections to 
confirm understanding and encourage elaboration. It also shows the richness of the 
data that can come from a qualitative interview, and also gives a foretaste of the 
“qualitative overload” problem that is involved in analyzing the material (see 
Chapter 12). In the following sections, we will look in more detail at the procedures 
used in conducting qualitative interviews. 


QUANTITATIVE SELF REPORT METHODS 

The literature on quantitative self-report methods is enormous, and we can only hope 
to scratch the surface here. More extensive treatments can be found in a number of 
specialist texts, for example, Bradburn et al. (2004), Butcher (1999), Dawis (1987), 
DeVellis (2012), Dillman et al. (2009), Marsden & Wright (2010), Saris and Gallhofer 
(2007), and Streiner and Norman (2008). For convenience, we will focus on written 
questionnaires with rating scales; however, everything that we have to say applies equally 
well to interviews and internet questionnaires designed to yield quantitative data. 


• The central quantitative self-report method is the written questionnaire or 
rating scale. 

• Questionnaire design may seem simple, but it is not—there is no shortage 
of badly designed questionnaires in circulation. 

• The central maxim is “take care of the respondent.” 

• Most questionnaires use a Likert scale. 

• Good items are clear, simple, and brief. 

• There are a number of issues in designing the response scale, such as the 
number of scale points, the type of anchors, unipolar or bipolar scales. 

• Response sets, such as acquiescence and social desirability, refer to tendencies 
to respond to items independendy of their content. They need to be taken 
into account when designing and interpreting self-report measures. 


As in other places in this book, we will describe the process from the viewpoint of 
constructing a measure, in order to give readers a better feel for the difficulties that 
are involved. The central point is that it is not just reliability and validity consider¬ 
ations that need to be taken into account when appraising a measure; it is worth 
looking closely at the fine detail of how the measure is put together. 

Steps in Measure Development 

If you are doing research involving a variable for which there is no adequate existing 
self-report instrument, you may need to construct your own measure. This is not a 
step to be undertaken lightly, as it is time consuming and requires skill to do well. 
However, because many areas are either undermeasured or are poorly measured, this 
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is a common type of research. One often approaches a new research area only to find 
that no good measures exist, and then ends up by reformulating the research toward 
developing such a measure. (A common experience of researchers is to discover that 
such studies are often more widely cited and influential than their other research.) 

If you need to construct a measure, the steps are roughly as follows: 

• Having done a literature search to make sure that no existing instrument is suitable, 
develop a first draft of the scale based on theory, pilot qualitative interviews, or 
analysis of existing questionnaires. 

• Progressively pilot the scale on respondents nearer and nearer to the intended 
target population (known as pretesting ), modifying it accordingly. Expect to take it 
through several drafts, for instance, first to colleagues, second to friends or support 
staff (ask them to point out jargon or awkward phrasings), third and fourth to 
potential respondents. It is often worthwhile running small informal reliability and 
possibly factor analyses on a pilot sample of 20 or 30 respondents to see whether 
any items should be dropped or added before doing the larger, formal study. 

• Once a satisfactory version of the scale has been developed, do a formal reliability 
study by giving the measure to a large sample (e.g., over 120 respondents) drawn 
from a population which approximates the population you are interested in. You 
can then examine its item characteristics (e.g., means and standard deviations), 
internal consistency, and factor structure. It is also typical to administer the measure 
twice to some of the participants, in order to assess its test-retest reliability. 

• If the reliability and factor structure are satisfactory, you can conduct appropriate 
validity studies (see Chapter 4), which examine the measure’s correlations with other 
criteria or constructs. (These studies may also be combined with the previous step.) 
The new measure is administered, along with a set of similar and different measures, 
such as a social desirability measure and measures that should not correlate with the 
new measure (to establish discriminant validity). It is also a good idea to use measures 
of more than one type or perspective, in order to reduce the problem of method 
variance (e.g., to use self-report measures plus observer ratings). The goal is to see 
whether the measure fits in with the pattern of correlations that would be predicted 
by the theoretical framework from which it was derived. 


Questionnaire Design 

Designing a questionnaire involves deciding on the topics to be covered and their 
sequence, writing the questions or items, and selecting an appropriate response scale. 
We will deal with each of these in turn. 

In all aspects of questionnaire design, the golden rule is “take care of the respondents.” 
Put yourself in their shoes and ask what the experience of being on the receiving end of the 
questionnaire is like. Make it as easy, rewarding, and free of frustration as possible. As part 
of the pilot testing, it is a good idea to fill out your questionnaire yourself (often a salutary 
experience) and give it to a few friends who will be able to give you constructive criticism. 

The goal is to not get in the way of respondents’ being able to communicate 
their thoughts and experiences. Trying not to alienate your respondents makes 
sense not only from a general human relations point of view, but it also makes good 
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scientific sense. Irritated people will not give you good data (or even any data at all— 
they may just discard your questionnaire). 

Topic Coverage and Sequence 

The questionnaire is often broken into subsections representing different topics or 
variables. The primary consideration is that, as a whole, it should adequately capture 
all of the concepts needed to answer the research questions. In other words, the data 
set should yield an answer to each of the research questions, or enable each of the 
hypotheses to be tested. Once this coverage has been achieved, the issue is then how 
to order the topic areas within the questionnaire. 

It is best to start off with easy, interesting, and nonthreatening questions that all 
respondents can answer (Dillman et al., 2009). This engages the respondents and 
helps to establish rapport with them: even a written questionnaire is a form of 
interpersonal relationship. Demographic questions (i.e., about the respondent’s age, 
sex, etc.) should usually be placed at the end of the questionnaire, as it is better to 
start with questions relevant to the topic of the interview. 

Structured interviews often adopt a funnel approach , that is, they start out broadly 
and then progressively narrow down. This reduces the risk of suggesting ideas to the 
respondents or influencing their answers. The interview typically begins with open- 
ended questions, then moves in the direction of increasing specificity. The pollster 
George Gallup (see Sheatsley, 1983) recommended the following ordering for public 
opinion research (e.g., to study opinions about sexual harassment): (1) test the 
respondents’ awareness of, or knowledge about, the issue; then (2) ask about their 
level of interest or concern; then (3) about their attitudes; then (4) about the reasons 
for these attitudes; and finally (5) about the strength of their opinions. 

Item Wording 

Having established the coverage of topics, the next step is to write the individual 
questions or items. The wording of an item is of crucial importance, as the way that a 
question is phrased can determine the kind of response that is given (Bradburn et al., 
2004; Saris & Gallhofer, 2007; N. Schwartz, 1999). It is worth heeding some key 
principles of item construction: 

Neutrality. The language of the item should not influence the respondent, that is, 
it should not suggest an answer. Possible errors take the form of leading questions 
(questions which are not neutral, which suggest an answer), questions with implicit 
premises (built-in assumptions that indicate the questioner’s viewpoint) and loaded 
words or phrases (ones that are emotionally colored and suggest an automatic feeling 
of approval or disapproval). Some examples of such problematic questions follow, 
with commentary after each: 

“Do you think that ...?” and “Don’t you think that ...?” 

These are leading questions that pull for a “yes” response. 

“When did you stop beating your wife?” 
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The latter question has become the cliched example of an implicit assumption; it 
assumes the respondent has been beating his wife (Payne, 1951). Such questions are 
usually to be avoided. However, there are times when implicit premises are useful for 
normalizing behavior, by giving the respondent permission to talk honestly. 
For example, studies of sexual behavior may sometimes use questions such as: “How 
old were you the first time you ...?”, rather than saying “Did you ever ...?” 

“How often do you refer to a counselor?” 

This question is a subtler variant of the implicit premise; it assumes that the respon¬ 
dent does refer to a counselor. It would be better to include “if at all,” or, even 
better, to have two separate questions, for instance, “Do you refer ...?” and “If yes, 
how often ... ?” 

“Why don’t you refer to a counselor more often?” 

This question assumes that referring more often is desirable. A better question would 
be: “What factors influence your referral decisions?” 

“How often did you break down and cry?” 

“Break down” is a loaded phrase which gives crying a negative connotation. In this 
case, it could simply be omitted. 

Clarity and simplicity. It is better to use simple, clear, everyday language, adopt¬ 
ing a conversational tone. Make sure that the item does not demand a reading 
level or vocabulary that is too advanced for your respondents. In particular, try to 
avoid psychological jargon (it is helpful to ask a nonpsychologist to read your 
questionnaire to detect it). Psychologists often become so used to their own 
technical language that they forget that members of the public do not understand 
it or find it strange. This is another reason why it is vital to pilot the questionnaire 
on ordinary people. 

Specificity. Lack of specificity gives rise to ambiguities, for example: 

“Do you ever suffer from emotional problems?” 

The phrase “emotional problems” means different things to different people. Therefore, 
it is better to define it or use alternatives. On the other hand, you could leave the 
phrase as it is, if you want to leave it open to people’s own interpretations. 

“Do you suffer from back pain?” 

It is better to give a time frame, for example, “in the last four weeks,” and also to 
specify the anatomical area, perhaps with the aid of a diagram (since, for example, 
respondents may not know if shoulder or neck pain should be included). 

“Do you like Kipling?” (“Yes, I kipple all the time.”) 
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People will often respond to a question that they do not understand, rather than 
saying explicitly that they do not understand it. 

Single questions. It is better to ask one thing at a time. Problems arise with double- 
barreled questions, that is, questions that have two independent parts: 

“Were you satisfied with the supervision and range of experience at your last clinical 
placement?” 

The respondent could be satisfied with the supervision, but not the range of experience. 

“Were you satisfied with your placement or were there some problems with it?” 

The two parts are not mutually exclusive: the respondent could be satisfied with 
a placement even though there were problems with it. 

“In order to ensure patients take their medication, should psychiatrists be given more 
powers of compulsory treatment?” 

The respondent could disagree with the implications of the initial premise, but agree 
with the main statement. 

Brevity. Short items are preferable. Sentences with multiple clauses can be difficult 
to process. As a final example of what to avoid, here is a classic of its kind, from no less 
a figure than the behaviorist John Watson. This monstrous item violates this and most 
other principles of item writing: 

Has early home, school, or religious training implanted fixed modes of reacting which 
are not in line with his present environment—that is, is he easily shocked, for example, at 
seeing a woman smoke, drink a cocktail or flirt with a man; at card playing; at the fact 
that many of his associates do not go to church? (Watson, 1919, quoted by Gynther & 
Green, 1982, p. 356) 

Constructing the Response Scale 

With a rating scale, the respondent gives a numerical value to some type of judgment. 
There is a wide variety of scale types: Likert scales, Guttman scales, Thurstone scales, 
rankings, etc. (Nunnally & Bernstein, 1994). Here we will focus on by far the most 
commonly used one, the Likert scale. This consists of two parts: the items (a set of 
statements, such as “I feel tense”) and the response scale, a set of alternatives of 
increasing intensity, with an integer numerical scale and verbal anchors (see Figure 6.1 
for some examples). Guttman scales are also sometimes used. The format is to ask the 
respondent to choose between a number of statements of increasing intensity. 
The Beck Depression Inventory uses a Guttman scale approach. 

Just as the form of the question can influence the response, so can the form of the 
response scale (Saris & Gallhofer, 2007; N. Schwartz, 1999). The major considerations 
in constructing response scales are: 

How many scale points. ? The number of scale points can range from two upwards. 
(Scales with two choices are known as binary or dichotomous, with three or more, 
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Agreement 


How much do you agree or disagree with each of the following statements? 


1 

Disagree 

strongly 

Frequency 

2 

Disagree 

moderately 

3 

Disagree 

mildly 

4 

Neither 
agree nor 
disagree 

5 

Agree 

mildly 

6 

Agree 

moderately 

7 

Agree 

strongly 

How often do you ...? 






0 

Never 

1 

Seldom 

2 

Sometimes 

3 

Often 

4 

Very often 



Quantity/proportion 

How many ... ? 

0 None 

1 Very few 

2 Some 

3 Very many 

4 All 


Degree/strength 

How (much)... ? 
0 Not at all 

1 Slightly 

2 Moderately 

3 Very (much) 


Figure 6.1 Examples of anchor words for Likert scales 


multiple ehoiee.) There may be logical reasons for using a certain number of responses: 
for example, some questions clearly demand a yes/no answer. However, it is more 
frequently the case that the response scale must be decided by the researcher. The main 
issues are: 

• There is a lot of researcher folklore about optimal numbers of scale points; 
however, item response theory (Bond & Fox, 2007) now provides empirical 
methods for evaluating rating scale use in order to evaluate how informants are 
actually using rating scale categories and in particular whether they are able to 
reliably discriminate neighboring points. 

• Rating scale category use appears to interact with item content: a five-point rating 
scale that works for one item or type of content might not work for another item 
or type of content. For example, it appears that informants can reliably discrimi¬ 
nate more rating scale points for individualized outcome problems that they have 
created themselves than for general psychological distress measures (Elliott et al., 
2004). 

• Although it is sometimes claimed that reliability increases with more scale points 
(Nunnally & Bernstein, 1994), a common finding is that informants cannot 
reliably discriminate two or sometimes even three of the middle scale points on 
the standard five-point rating scales used on measures of general psychological 
distress. In any case, there seem to be diminishing returns beyond five points 
(Lissitz & Green, 1975). 
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• More can be less when it comes to rating scale categories: adding more categories 
does not increase reliability, and often decreases it, because when asked to discriminate 
beyond their ability, informants tend to answer randomly (Bond & Fox, 2007). 

• The extreme ends of rating scales tend to be underutilized by informants either 
because the events are infrequent (as might be expected under normal curve 
assumptions) or because they actively avoid using the extreme ends of the rating 
scale, a phenomenon known as the central tendency. This means that the upper 
and lower ends of rating scales often need careful attention to wording. (Relatedly, 
there is more measurement error at the extreme ends of rating scales; the usual 
reliability statistics only apply to the middle of the rating scale.) 

• Instead of using discrete scale points, another approach is to ask respondents to 
put a mark on a 10 centimeter line (a visual-analog scale), and then use a ruler to 
make the measurement (McCormack, Horne, & Sheather, 1988). This is used, 
for example, in pain research, to assess the intensity of the respondent’s pain 
experience. However, Thomee, Grimby, Wright, and Linacre (1995) reported 
that, while such scales create an illusion of fine discrimination, they are in fact 
equivalent to seven- or even four-point scales. 

Unipolar or bipolar. Response scales can either be unipolar or bipolar. A unipolar scale 
has only one labeled construct, which varies in degree. For example, a scale measuring 
intensity of pain might range from “No pain at all” to “Unbearable pain.” A bipolar scale 
has opposite descriptors at each end of the scale (e.g., “Happy” at one end and “Sad” at 
the other). In Figure 6.1, the Agreement scale is bipolar; the others are unipolar. 

Mid-point. Bipolar scales may or may not have a mid-point, representing such 
options as “Don’t know,” “Unsure,” “Neutral,” or “Neither one way or the other.” 
In other words, the scales may have either an odd or an even number of steps. 

The argument against having a mid-point is that people usually hold an opinion, 
one way or the other, which they will express if you push a little. This procedure is 
known as forced choice: for example, “Do you agree or disagree with the following 
statements?” Forced choice makes data analysis easier, because respondents can be 
divided into those expressing a positive and those expressing a negative opinion. 
Furthermore, according to item-response analyses, mid-points sometimes misscale 
and are best left out (e.g., Bourlce-Taylor, Pallant, & Law, 2014). However, if a 
question is worded well you should not get a lot of middle responses in the first place. 

The argument for having a mid-point is that neutrality represents a genuine 
alternative judgment, and so it is coercive not to allow respondents to express their 
opinions in the way that they want to. 

Anchoring. Anchoring refers to labeling the points of the scale in words as well as 
numbers. You usually want to define the steps explicitly, so that people are rating to 
the same criteria. However, this does make two measurement assumptions: (1) that 
the scale has interval properties (see Chapter 4), that is, that its steps are all equal 
(for example, that the distance between “not at all” and “slightly” is the same as 
between “very” and “extremely”); and (2) that people understand the same thing 
by all the adjectives. Try to avoid modifiers with imprecise meanings, for example, 
“quite” can sometimes intensify (equivalent to “very”) and sometimes diminish 
(equivalent to “somewhat”). 
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Sometimes researchers just anchor the end-points of the scales, as in visual-analog 
scales and semantic differentials (which use pairs of bipolar adjectives, such as good- 
bad, hard-soft, heavy-light). It is also possible to anchor alternate scale points as a 
compromise between anchoring every point and only anchoring the extremes. 
However, not anchoring scale points leaves their meaning unclear. 

Response Sets 

Response sets refer to the tendency of individuals to respond to items in ways not 
specifically related to their content (Bradburn, 1983; Nunnally & Bernstein, 1994). 
They may be conceptualized as personality variables in their own right. The most 
commonly encountered response sets are acquiescence and social desirability. 

Acquiescence (“yea-saying”) refers to a tendency to agree rather than disagree. 
The classic example of acquiescence problems is with the California F-scale (Adorno, 
Frenkel-Brunswick, Levinson, & Sanford, 1950), which was developed to measure 
authoritarian tendencies (the F stands for fascist). Early item-reversal studies , in which 
some of the items were replaced by their inverse, seemed to show that this scale was 
mostly measuring acquiescence rather than authoritarianism (although there is some 
dispute about this conclusion, see Rorer, 1965). 

The way to get around acquiescence problems is to have an equal number of 
positively and negatively scored items in the scale. For example, in an assertiveness 
scale, the item “If someone jumps to the head of the queue, I would speak up” would 
be scored in the positive direction, while “I tend to go along with other people’s 
views” would be scored in the negative direction. Thus, when the items are reversed 
and averaged, any tendencies to acquiesce would cancel themselves out. 

Acquiescence has been noted as a particular problem when working with people 
with mental retardation (“intellectual disabilities” in the UK terminology). The title 
of Sigelman, Budd, Spanhel, & Schoenroclc’s (1981) paper, “When in doubt, say 
yes,” is often quoted in this context. Sigelman et al. recommend some guidelines for 
good practice, for example, that researchers avoid “yes/no” questions and instead 
use open-ended questions with this population. However, Rapley and Antaki (1996) 
argue, from a conversation analysis point of view, that the assumption of an acquies¬ 
cence bias in people with mental retardation is not fully substantiated by the 
evidence. 

Social desirability refers to a tendency to answer in a socially acceptable way (“faking 
good”), either consciously or unconsciously (Crowne & Marlowe, 1960, 1964). This 
is especially a problem in occupational testing, as the following humorous advice for 
aspiring businessmen illustrates (it also embodies the outdated assumption that 
business executives are men): 

When an individual is commanded by an organization to reveal his innermost feelings, he 
has a duty to himself to give answers that serve his self interest rather than that of The 
Organization. In a word, he should cheat ... Most people cheat anyway on such tests. 
Why then, do it ineptly? ... When in doubt about the most beneficial answer, repeat to 
yourself: I loved my father and my mother, but my father a little bit more. I like things 
pretty much the way they are. I never worry about anything. (Whyte, 1959, p. 450, 
quoted in Crowne & Marlowe, 1964) 
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In clinical research, it is also important to consider the opposite tendency, that is, 
where respondents may attempt to “fake bad.” This may occur in forensic contexts, 
when offenders may be trying to obtain a lighter sentence or a softer prison regime; 
in the case of insurance claims for psychological trauma, where people may be attempting 
to get a larger settlement; or in the context of being on a waiting list for psychological 
therapy, where clients may be trying to get help sooner. 

Possible ways to get around social desirability problems are: 

• Embed a social desirability scale within the main instrument, such as the Marlowe - 
Crowne (Crowne & Marlowe, 1960) Social Desirability Scale, the L (Lie) scale on 
the Eysenck Personality Questionnaire (EPQ: Eysenck & Eysenck, 1975) and the 
Minnesota Multiphasic Personality Inventory (MMPI: Hathaway & McKinley, 
1951), and the K (Defensiveness) scale on the MMPI. These provide a direct 
measure of the extent of socially desirable responding. Factor-analytic studies have 
found these scales to have two separate components, self-deception and impression 
management (Paulhus, 1984). 

• Use a forced choice format, where the respondent chooses between alternatives of 
equal social desirability. For example, the Edwards Personal Preference Scale 
(Edwards, 1953), which measures personality dimensions such as achievement 
and affiliation, has paired items balanced for social desirability. However, some 
respondents may object to the constraining nature of such instruments. 

• Use “subde items,” on which the acceptability of the response is not apparent, for 
example, on the MMPI (Weiner, 1948). However, this practice raises questions 
about the face validity of the scale, and is not without controversy (Hollrah, 
Schlottmann, Scott, & Brunetti, 1995). 

Assembling the Questionnaire and Looking Ahead 

Having designed the questions and response scales, the final task is to assemble them 
into a coherent questionnaire. Once again, the maxim “take care of the respondent” 
should be primary. Try to make the experience of completing the questionnaire as 
engaging as possible, and minimize anything which might exhaust or irritate 
respondents. 

Make the questionnaire look attractive by giving it a pleasing layout with readable 
typefaces, and use language which is easily understandable and welcoming. It also helps 
respondents work through the questionnaire if the topics are ordered in a logical 
sequence, and the transitions between different topic areas are made as smooth as 
possible. Simple things, such as introducing each section with phrases like “This section 
asks about...” can make the respondent’s task easier. 

Think about data analysis before the final draft, as you may want to print data entry 
instructions on to the questionnaire. If possible, try some quick analyses to examine 
your main research questions on the pilot sample. 

We will deal with sampling in general in Chapter 10, but there are some issues 
specific to mail surveys. Dillman et al. (2009) suggests aiming for a response rate of 
over 60%, and sending out reminders to increase the initial response. Bear in mind that 
people who return questionnaires are not usually representative of the whole target 
population: they tend to be higher on literacy, general education, and motivation. 
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In order to conceptualize what would lead to sample bias, ask yourself why someone 
would not fill out the questionnaire. It is sometimes possible to estimate bias by 
comparing respondents with nonrespondents on key variables. For instance, in a client 
satisfaction survey, it may be possible to see if the clients who filled out the survey 
questionnaire differed from those who did not, in terms of severity of problems or 
length of time in therapy. 

The internet is an important alternative medium for conducting surveys. A recent 
development in questionnaire research is the internet-based questionnaire (Dillman et 
al., 2008). Several applications exist for helping researchers construct and conduct 
web-based surveys, such as Survey Monkey, Opinio, and Qualtrics. The internet also 
has the advantage of providing access to a wider potential sample of respondents, par¬ 
ticularly important with difficult-to-access populations (see also Chapter 10). For 
example, Barry, Elliott, and Evans (2000) found that Arab immigrants to the United 
States were more willing to respond when approached in this way than face to face, 
and that respondents recruited via the internet did not appear to differ from those 
obtained in the usual way. 

Integrating Qualitative and Quantitative Self-report Methods 

It is worth re-emphasizing that our separation of interview and questionnaire, and 
qualitative and quantitative methods was for didactic, not practical, purposes. Our view 
is that all combinations of self-report/observational, qualitative/quantitative data 
collection methods have their uses. It is possible to use written questionnaires within 
observational protocols and to combine open-ended and closed-ended questions in 
the same questionnaire or interview. For example, it is often a good idea to begin and 
end structured quantitative interviews with general open-ended questions. Questions 
at the beginning give the respondents a chance to talk before they have been influenced 
by the researcher’s framework, and questions at the end give them a chance to add 
anything that may not have been addressed within that framework. 


CHAPTER SUMMARY 

This chapter has covered the procedures for constructing self-report methods, such as 
interviews and questionnaires. The advantages of self-report are that it gives the person’s 
own perspective, and that there is no other way to access the person’s own experience. 
Its disadvantage is that there are potential validity problems: people’s reports may contain 
errors due to deception, inaccurate recall, or the unavailability of the information to 
conscious processing. 

There are both qualitative and quantitative approaches to self-report. The main 
qualitative self-report approach is the semi-structured interview. This allows a flexible 
interview style, with probes where necessary, and helps respondents describe their 
own experience in their own words. Qualitative interviewing is a distinct skill, differ¬ 
ent from clinical interviewing, including interviewing for psychological assessment. 

The main quantitative self-report approach is the written questionnaire using a 
Likert scale. There are a number of principles to follow in constructing quantitative 
self-report instruments, both concerning the wording of the items and the form of 
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the response scale. Response sets, such as acquiescence and social desirability, refer 
to tendencies to respond to items independently of their content. They need to be 
taken into account when designing and interpreting self-report measures. 

Although we have discussed qualitative and quantitative approaches separately, they 
can easily and fruitfully be combined within a single interview or questionnaire, or 
within a study as a whole. 


FURTHER READING 

Josselson (2013), Kvale and Brinkman (2009), and Rubin and Rubin (2005) give 
valuable guidance on constructing and conducting qualitative interviews. There are an 
enormous number of references on assembling quantitative self-report instruments. 
Ones that we have consulted in preparing this chapter include Bradburn et al. (2004), 
Butcher (1999), Dawis (1987), DeVellis (2012), Dillman et al. (2009), Marsden & 
Wright (2010), Saris and Gallhofer (2007), and Streiner and Norman (2008). 


QUESTIONS FOR REFLECTION 

1. The limitations of self-report methods are well known, and yet the vast majority 
of psychological research uses self-report methods. Why do you think this is? 
Is this a problem, as some have said? 

2. Qualitative researchers disagree about what constitutes “leading,” and how 
careful to be in avoiding leading responses, for example, whether it is OK to ask 
closed questions or to paraphrase back to the informant what he or she says. What 
do you think? 

3. How detailed should an interview schedule be? Qualitative researchers sometimes 
find themselves developing increasingly elaborate interview schedules (say with 
20 or more questions), while others argue that a single question is often enough 
(“Tell me the story of...”) What do you think? What are the consequences of 
simple versus complex interview schedules? 

4. Develop a brief interview schedule consisting of three or four central questions 
on your research topic. Do a 5-10 min interview with a friend or family member 
who has had an experience that approximates your topic. Describe what you 
learned about (a) your research topic and (b) your interview schedule. 

5. Select a psychological construct you’re interested in and construct a very short 
(e.g., 6-8 item) self-report instrument to assess it. Reflect on alternative ways to 
word the items and alternative formats for the response scale. 
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KEY POINTS IN THIS CHAPTER 

• Observation provides a direct measure of behavior. 

• It is useful when precise records are needed, or for behaviors that are not 
amenable to self-report. 

• The main qualitative approaches are participant observation and text-based 
methods. 

• Quantitative approaches use structured methods to give precise counts of 
behavior. 

• There are several different methods for conducting quantitative observation. 

• Careful selection, training, and monitoring of raters are important to achieve 
good reliability. 


In the biological and physical sciences, various forms of observation are the only possible 
data collection method, since the objects of study are animals, plants, and inanimate 
objects. In psychology, however, researchers have a choice: whether to use observation, 
self-report, or a combination of the two. However, in psychology, observational methods 
have generally been underused, to the potential detriment of the discipline as a whole 
(Baumeister, Vohs, & Funder, 2007). 

Observation can take many forms. You may observe the person in their own natural 
setting, such as at home or at school. The ecological psychology studies of Roger 
Barker and his colleagues (Barker, Wright, Schoggen, & Barker, 1978), which provide 
extensive accounts of behavior in community settings, are classic examples of this 
type. Or you may observe under standardized conditions in the clinic or laboratory. 
The Strange Situations Test (Ainsworth, Blehar, Waters, & Wall, 1978), which assesses 
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a young child’s reactions to being separated briefly from his or her parent under 
standardized conditions, is a widely used example in attachment research. 

Observation is a special activity, a form of disciplined description that does not 
come naturally to most people. It is different from evaluation (of whether something 
is good or bad), explanation (of why something happened), or summary report. 
Observational methods thus require special training. 

We are referring here to observation as a measurement method, not as a research 
design. Correlational research designs are sometimes called passive observation studies 
(see Chapter 8); this is something of a misnomer, as they may not use observation at 
all. Observational data may be used in descriptive, correlational, or experimental 
designs: there is no logical link between the measurement method and the design. 

Advantages and Disadvantages 

The advantage of observation is that it is a direct measure of behavior, and thus can 
provide concrete evidence of the phenomenon under investigation. For example, if 
you are studying couples’ communication, the members of a couple may say in an 
interview that they “can’t communicate and always end up getting nowhere.” 
However, if you actually observe them interacting, you usually get a clearer indication 
of the nature of their problems: for example, one member of the couple may be 
critical and the other may withdraw. Another example is in studying children referred 
for behavior problems, where a father might say “my daughter is disobedient.” 
Observing the interaction between the father and his daughter allows you to see for 
yourself what actually goes on between them. 

Furthermore, observation enables you to assess the behavior within its context and 
examine sequences of behavior over time. Observing children within their family setting 
and also in the classroom allows the researcher to identify and measure situational 
variables (e.g., critical remarks from siblings or peers) that might contribute to problem 
behaviors, and to see which events precede or follow other events. 

Observation is also good for studying behavior that people may not be aware of 
(e.g., nonverbal behavior) or behavior that is inaccessible using self-report methods 
(e.g., because of denial, distortion, or simply forgetting). It is also good for research 
involving individuals with limited language ability, such as young children or people 
with cognitive impairments. 

The disadvantage of observation is that it can only be used to answer certain research 
questions, principally where you are interested in overt behaviors. Often research ques¬ 
tions are more complex than this, and the overt behavior is only one aspect. Observational 
methods can, however, be used to study internal processes, in two ways. One is that the 
research participant can be the observer (i.e., engage in self-monitoring, see below). 
The other is that observations of behavior can be used to make inferences about 
cognitive processes or emotional states, for example using the emotional Stroop test 
(Williams, Mathews, & MacLeod, 1996), or using projective tests such as the Rorschach 
or the Thematic Apperception Test (TAT) (Westen et al., 1999). 

Another disadvantage is that observational studies often have problems with reac¬ 
tivity of measurement: people may behave quite differently if they know they are 
being observed. A potential solution to the reactivity problem is to observe covertly, 
but this clearly raises ethical issues about deception (which we address below). 
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Qualitative and Quantitative Observation 

Like self-report methods, observational data collection methods have been developed 
independently within the qualitative and the quantitative traditions. 

Although the distinction between the two approaches is not always clear, since 
qualitative material can be analyzed in a quantitative way using content analysis, this 
chapter will look at each of these traditions in turn. 


QUALITATIVE OBSERVATION 

In qualitative observation, the observer attempts to record a narrative account, 
which, like a literary description, brings the scene to life. However, unlike a literary 
description, the account attempts to be explicit and systematic rather than meta¬ 
phorical and intuitive. 

As we discussed in Chapter 5, the historical roots of qualitative methods lie in ancient 
Greek and medieval histories and travelogues. Systematic qualitative observation as a 
research method was developed as part of the ethnographic approach in anthropology, 
as, for example, in the early work of Malinowski (1929) in the South Pacific (see 
Emerson, 2001). It also was found in medical case studies, Freud (1905/1977) being 
the outstanding example within psychological medicine. The data in such case studies 
are often not purely observational, as the clinician will also draw upon self-report data 
(to a major extent in the case of psychoanalytic case studies). 

A number of different approaches fall under the qualitative observation umbrella. 
This section examines two major ones: participant observation and text-based 
methods. (The analysis of data obtained from such methods is covered in Chapter 12.) 


Participant Observation 


Features of participant observation: 

• Its roots lie in the ethnographic approach to anthropology. 

• It involves the researcher becoming “immersed” in the setting and taking 
extensive field notes. 

• Methodological problems include reactivity and observer bias. 

• Covert observation raises ethical issues. 


Participant observation refers to a procedure in which the observer enters an 
organization or social group (such as a psychiatric hospital or a youth gang) in order 
to gain first-hand experience of its workings. It is characterized by a period of intense 
social interaction between the researcher and the people being observed, in their own 
setting, during which the data are collected unobtrusively and systematically 
(Angrosino, 2007; Emerson, 2001; Miller, Hengst, & Wang, 2003; Taylor & Bogdan, 
1998). Thus, participant observation involves: (1) the observer’s immersion in the 
situation; (2) systematic, but unstructured observation; and (3) detailed recording of 
observations, generally from memory. (Note that the term “participant” is somewhat 
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ambiguous in this context, as it can refer both to the researcher—the participant 
observer—and also to the people being observed—the research participants.) 

The observer’s role in the setting can be anywhere on the continuum from a 
complete participant, such as when Goffman worked as a mental hospital aide to make 
the observations in Asylums { 1961), to a complete observer, such as when traditional 
ethnographers lived in cultures which they were not a part of. Taylor and Bogdan 
(1998) warn about the dangers of observing a setting with which one is overly familiar 
(due either to friendship or to expertise), as the study may be compromised by the 
researcher’s inability to take multiple perspectives and by the temptation to censor 
reports or data which may offend colleagues or friends. 

As the object of study is more often an organization or social group, rather than an 
individual, participant observation is more compatible with the framework of sociology 
and anthropology than psychology. It is particularly associated with the Chicago 
School of sociology and with the sociology of deviance (Emerson, 2001). Whyte’s 
(1943) Street Corner Society, a study of Italian-American youth gangs, is a classic 
example of the genre. Whyte, who was a researcher at Harvard University, spent several 
years living in the Boston community that he was studying and talking to key infor¬ 
mants (mosdy members of the gang) in order to understand the structure of their 
organization. His observations involved joining in with their everyday activities, for 
example, gambling, bowling, etc. After each period of observation, he wrote extensive 
field notes, which he later analyzed with the help of one of the informants in his study. 

The narrative case study (see Chapter 9) also can be considered a form of partici¬ 
pant observation, at least where the focus is on describing the therapeutic process, 
rather than on giving an account of the client’s history. Studies of individual therapy 
represent the psychologically interesting situation where one member of a dyad is 
observing the development of that dyad. There are many such accounts from the 
perspective of both the therapist and the client, and Yalom and Elkin (1974) interes¬ 
tingly combine their parallel accounts of the same therapy in one volume. 

Research Questions 

In line with the phenomenological approach, most participant observation researchers 
try to start with no preconceptions about the phenomena under study. They will often 
go through a process of bracketing, that is, an attempt to identify their preconcep¬ 
tions and set them to one side (see Chapter 5). Anthropologists, in particular, will 
attempt to set aside their ethnocentric biases when observing other cultures. However, 
as we have discussed in Chapter 2, disinterested, theory-free observation is an unat¬ 
tainable ideal, since researchers are always observing from within a theory or world 
view that says what is important and what is trivial. The issue is being aware of and 
minimizing, rather than eliminating, the extent of one’s biases. In any case, the par¬ 
ticipant observer usually attempts to start the observation unconstrained by prior 
hypotheses or specific variables of interest. 

In practice, participant observation research usually has a clear focus (e.g., to study 
the social structure of a psychiatric in-patient ward). The research questions are usu¬ 
ally discovery-oriented. Participant observation is often conducted within the 
grounded-theory approach (see Chapter 5), in which the theory evolves as the study 
progresses. 
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Pragmatics 

As we described in Chapter 3, you often gain access to the research setting via a gate¬ 
keeper, for example an administrator or senior doctor who decides whether to allow 
you into the organization. It is worth starting field notes at this point, since the 
process of negotiating entry says much about the workings of the organization that 
you are entering. Organizations that are conflict-ridden, suspicious, or highly bureau¬ 
cratic will each have their characteristic ways of admitting (or excluding) outsiders. 

Once in the setting, you may develop a set of key informants who provide in-depth 
accounts. However, be wary of over-reliance on any one informant, or of implicitly 
selecting informants whose views agree with your own; try to obtain stories from a 
variety of perspectives. 

In the participant observation tradition, the period of observation is fairly extensive, 
usually lasting several months. The researcher is initially passive and works at establish¬ 
ing rapport, and attempts to avoid being forced into roles (e.g., “volunteer”). Some 
guidelines (e.g., Taylor & Bogdan, 1998) suggest that the researcher limit observation 
sessions and observe at different days and times (e.g., nights, weekends). 

It is useful to pay special attention to any unusual use of language in the setting, as 
this can often be a clue to important aspects of its structure (Taylor & Bogdan, 1998). 
The vocabulary used by staff to refer to clients may give important clues to their under¬ 
lying feelings towards them. For example, do staff in a drug dependency unit refer to 
their clients as “patients,” “addicts,” or “junkies,” or is there some local terminology 
by which they distinguish between different types of clients? 

Field Notes 

The observations are recorded in the form of field notes, which describe the setting 
and the people in it (possibly including a diagram), as well as their verbal and non¬ 
verbal behavior (Emerson, 2001; Emerson, Fretz, & Shaw, 2011; Taylor & Bogdan, 
1998). Good field notes bring the scene to life. In addition, things that do not make 
sense should be recorded for later clarification, and your own actions should also be 
noted, in order to help to judge your effect. Finally, it is important to differentiate the 
behavior you are observing from your own reactions and interpretations; the latter 
should be noted and labeled as, for example, “Observer’s Comments.” As in all 
research, try to separate description from evaluation and be aware of how your pre¬ 
conceptions may be influencing your observations. 

The researcher does not usually take notes or make recordings during the observa¬ 
tion period, as this often distracts those being observed and is more likely to influence 
their behavior. Part of the skill of being a participant observer lies in developing your 
memory. In order to prevent memory overload, limit your time in the setting to an 
hour or two and write the notes immediately after leaving the field. Remembering key 
words and drawing a diagram of the setting are useful strategies. The guiding principle 
(which applies to other areas of life too) is: “If it’s not written down, it never 
happened.” 

To illustrate, here is an excerpt from field notes taken during Taylor and Bogdan’s 
(1998) study of institutions for the mentally retarded (UK: people with intellectual 
disabilities). The field notes include detailed description as well as the observer’s 
comments (labeled as O.C.): 
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As I get to the dayroom door, I see that all the residents are in the room. I can only 
see two attendants: Vince and another younger man. (O.C. It’s interesting how I 
automatically assume that this other man is an attendant as opposed to a resident. 
Several hints: long hair, moustache, and glasses; cotton shirt and jeans, brown leather 
boots. He’s also smoking a cigarette, and a resident, Bobby Bart, is buffing his shoes 
with a rag. Thus this attendant’s dress and appearance differ from that of the resi¬ 
dents.) Vince, who is 21, is wearing jeans, brown leather boots, and a jersey that has 
“LOVE” printed on it. He has long hair, sideburns, and a moustache. 

I wave to Vince. He half-heartedly waves back. (O.C. I don’t think that Vince has 
quite gotten used to me coming.) The other attendant doesn’t pay any attention to me. 

Several residents wave or call to me. I wave back. 

Kelly is smiling at me. (O.C. He’s obviously happy to see me.) I say to Kelly, “Hi, 
Bill, how are you?” He says, “Hi, Steve. How’s school?” “OK.” He says, “School’s a 
pain in the ass. I missed you.” (O.C. According to the attendants, Kelly attended 
school at the institution several years ago.) I say, “I missed you too.” 

I walk over to Vince and the other attendant. I sit down on a hard plastic rocker 
between Vince and the other atten., but slightly behind them. The other atten. still 
doesn’t pay attention to me. Vince doesn’t introduce me to him. 

The smell of feces and urine is quite noticeable to me, but not as pungent as usual. 

I, along with the attendants and perhaps five or six residents, am sitting in front 
of the TV, which is attached to the wall about eight feet off the floor and out of the 
residents’ reach. 

Many of the 70 or so residents are sitting on the wooden benches which are in a 
U- shape in the middle of the dayroom floor. A few are rocking. A couple of others 
are holding onto each other. In particular, Deier is holding onto the resident the 
attendants call “Bunny Rabbit.” (O.C. Deier is assigned to “Bunny Rabbit”—to keep 
a hold of him to stop him from smearing feces over himself.) 

A lot of residents are sitting on the floor of the room, some of these are leaning 
against the wall. A few others, maybe 10, just seem to be wandering around the 
room. (Taylor & Bogdan, 1998: 266) 

This extract illustrates the richness and vividness of observational data, but also its 
wide-ranging and unfocused nature. It is difficult at first reading to know how to 
make sense of it all. 

Ethical Issues 

Two major ethical issues arise with participant observation: whether the observation 
should be overt or covert, and what to do when you observe illegal or immoral acts. 
These issues also may occur in quantitative observation, but have been more salient in 
the participant observation literature. (We discuss ethical issues in general in Chapter 10.) 

There are many examples in the literature where the observers concealed the fact 
that they were conducting research. This was usually done in settings where reactivity 
of measurement would have been a major problem. Two well-known historical exam¬ 
ples are Humphries’s (1970) study of homosexual activity in men’s public lavatories, 
which generated an enormous debate over its ethics, and Rosenhan’s (1973) pseudo¬ 
patient study. Researchers conducting covert observations argue that the nature of 
their research precludes their asking the consent of those being observed and that 
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their findings bring benefits that justify the deception. However, such deception is 
contrary to the ethical principle of informed consent and lays the researcher open to 
charges of being a spy or a voyeur. Proposed research involving covert observation 
should be subjected to thorough consultation on its ethical status. 

A related issue is what to do in cases where you observe illegal or immoral acts. For 
example, in the above study of state institutions for the mentally retarded, Taylor and 
Bogdan (1998) observed attendants beating and abusing residents. They argue that 
stepping out of the observer role could have had some short-term benefits, in that they 
may have been able to stop the specific instance of abuse. On the other hand, it would 
have effectively terminated their project, which had the potential to end the abuse by 
documenting it, and possibly leading to permanent changes in the institutional struc¬ 
ture to prevent future abuse. In cases like this, researchers face complex dilemmas to 
which there are no clear-cut answers. In such situations, it is always important to seek 
out research colleagues or supervisors to help you explore the ethical issues. 

Quality of the Data 

Finally, participant observation raises some specific issues of reliability and validity: 

• Reliability. It is hard to check observer accuracy in participant observation. 
Although it is theoretically possible to have two or more simultaneous observers 
in the setting, this is rarely done in practice. It is, however, quite possible to rep¬ 
licate an observation in several settings at once, as was done in Rosenhan’s (1973) 
pseudopatient study. Reliability in participant observation can also be examined 
by considering the consistency of behavior across time. 

• Observer bias. As we have discussed above, all observation is to some extent biased, 
in the sense of being governed by previous understandings and expectations, 
whether consciously by theory, or unconsciously by ethnocentricity or general 
worldview. These biases affect how observers see things; reports from informants 
may also be biased because of their own particular perspectives and special interests 
(Kurz, 1983). 

• Reactivity. The presence of an observer may alter the behavior of those being 
studied. This reactivity problem is not unique to qualitative approaches; it occurs 
with all types of observation. Participant observers may be able to mitigate it by 
allowing time for the people in the setting to become accustomed to them. Some 
researchers try to get around this problem by conducting covert observations, but 
this of course raises ethical problems (see above). 


Text-based Research 

The second area of qualitative observation is text-based research. Texts of written or 
spoken forms of communication provide the basis for a loosely organized set of 
research approaches that we have referred to as language-based approaches (see 
Chapter 5). These texts include transcripts of conversations, official documents, 
television broadcasts, and newspaper articles. 

Such methods are not new. There is a long tradition of discourse analysis within 
sociology, heavily influenced by linguistics (Labov & Fanshel, 1977; Potter & 
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Wetherell, 1987; Sudnow, 1972). In the middle of the 20th century, some psycholo¬ 
gists (e.g., Allport, 1942) advocated “personal documents” research, although this 
did not become a mainstream avenue for research at the time. 

Text-based approaches involve a close study of the text under examination: the 
focus is on its structure, or its underlying assumptions and meanings, rather than on 
what it is supposed to be describing. This differs from self-report in that the intention 
is to analyze the text as a sample of communication, rather than to understand what 
the speaker or author is thinking or feeling. 

Sources of Texts 

Studies using these approaches can draw on a wide range of possible sources: 

• Personal documents. The classic personal documents approach collected letters, 
diaries (e.g., of language acquisition), or other personal accounts (e.g., William 
James’s, 1902, Varieties of Religious Experience). 

• Administrative records or archives can be used, for example, court records, and 
also, within the limits of confidentiality and consent, clinical case records including 
intake reports and contact notes (e.g., Todd, Kurcias, & Gloster, 1994). 

• Cultural texts include widely disseminated published records. These could be self- 
help books, political speeches, entertainment media (e.g., TV, movies, computer 
games), or educational texts. 

• Visual representations may include photographs, advertisements, and home videos. 

• Naturally occurring interactions. Researchers may be interested in ordinary, 
everyday conversations (e.g., children’s talk in a school playground) or specialized 
ones (e.g., psychotherapy interactions or telephone helplines). 

• Collected or “found” examples of language usage include slang and metaphor, for 
example, to examine how people talk about mental illness. Researchers can also 
collect examples of a phenomenon of theoretical or practical interest. This is a 
heritage from natural history and linguistics in which research was done by col¬ 
lecting examples (e.g., Freud’s collection of “slips of the tongue”). 

• Invited or constructed texts are set up or solicited by the researcher. For example, 
the researcher might ask participants to write personal accounts of difficult experi¬ 
ences, or might set up a family interaction task (e.g., to generate parent-teenager 
conversations). Qualitative interview transcripts can also be approached from a 
discourse rather than a self-report point of view (i.e., with a focus on how things 
are talked about, rather than on the content per se). 

Examples 

Historical research is the prototypical example of text-based research. From the point 
of view of psychology, historical research can be valuable for bringing the past alive, 
in order to help us understand the historical context of important ideas in psychology 
(e.g., the origin of psychoanalysis, J. Schwartz, 1999), for exemplifying important 
psychological processes in the lives of individuals (e.g., Erilcson’s (1969) psychobiog¬ 
raphy of Gandhi), and for showing the psychological influence of the past on people’s 
current experiences (Zeldin, 1994). Historical research generally makes use of multiple 
sources (e.g., the range of types of texts mentioned above), and in some cases, 
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qualitative interviews (as in oral history). The power of careful historical analysis is 
illustrated by Runyan (1982), in his analysis of the large number of explanations for 
why the painter Vincent van Gogh cut off his ear. By careful analysis of historical 
records, Runyan was able to rule out all but a few explanations as being inconsistent 
with what is known about van Gogh’s life. 

Three other examples illustrate the wide variety of discourse analytic studies, partic¬ 
ularly in relation to clinical psychology. The first is Madill and Barkham’s (1997) study 
of a successful case of psychodynamic psychotherapy, which we outlined in Chapter 5. 
The second is Labov and Fanshel’s (1977) classic study: a book-length report which 
analyzed a single 15-minute segment of a psychotherapeutic interview. They used 
microanalytic methods to examine both the content of the speech and also its paralin- 
guistic features, such as voice spectrogram patterns. They revealed how much rich 
meaning is carried in subde, barely noticeable variations in speech, and demonstrated 
the complex nature of the mutual responsiveness between client and therapist. 

The third example is Harper’s (1994) study of how five mental health professionals 
used the term “paranoia.” His analysis identified a number of discourses, or systematic 
ways of talking, about paranoia. For example, these included the “empiricist” account, 
where lists of characteristics and symptoms were the focus, and the “contingent” 
account, where personal and social values were acknowledged in interviewees’ discus¬ 
sions. These discourses, and the ways in which professionals moved between them, 
seemed to serve particular functions, such as the assertion of professional legitimacy. 


QUANTITATIVE OBSERVATION 


• Quantitative observation involves the systematic counting or timing of spec¬ 
ified behaviors. 

• A clear operational definition of the behaviors is essential (but not always easy). 

• Lower levels of inference usually lead to better reliability. 

• There are several different methods of conducting observations, for example, 
interval recording and time sampling. 

• The observers (also known as raters or judges) need to receive careful 
training and monitoring throughout the study. 


The essence of quantitative observation (aside, of course, from its using numbers) 
is that the variables being observed and the methods for observing them are explicitly 
defined. It is characterized by the use of predefined behavior codes by trained 
observers (also called raters, coders, or judges) of demonstrated reliability. Quantitative 
observations are usually targeted at a small number of prespecified behaviors, although 
they sometimes can be more wide-ranging. 

For instance, researchers observing aggression on a children’s playground must 
specify precisely what constitutes and what does not constitute an aggressive act: 
when, for example, does a touch become a push or a punch? They must also specify 
which aspects of such acts will be recorded, for instance, type, frequency, or intensity. 
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Thus, compared to qualitative observation, quantitative methods represent a gain in 
precision at the expense of a narrowing of scope and context. 

Quantitative observations can be used to address several different types of research 
question. For example, they can be used for questions of description (e.g., which types 
of verbal response modes are used in child psychotherapy?), for sequential analysis (e.g., 
which types of client response are most likely to follow a therapist interpretation?), and 
for questions of covariation (e.g., does observer-rated empathy correlate with treatment 
outcome?). A further use is the assessment of therapist adherence or competence, to 
answer evaluation questions (e.g., is this therapist delivering this type of treatment at an 
adequate level?). 


Background 

Historically, quantitative observation methods developed independendy in four differ¬ 
ent applied areas: behavioral observation, psychotherapy process research, develop¬ 
mental psychology, and content analysis in communication. However, despite differences 
in language and underlying philosophy, many of the same methodological issues apply 
in all four areas. We will mosdy draw on examples from behavioral observation, as that 
is where the method is most systematically articulated. 

Behavioral observation has its conceptual roots in methodological behaviorism, which 
argues that psychology should restrict itself to observable behavior (see Chapter 4). 
Also, Mischel’s (1968) argument, that the validity of traditional, trait-based assessment 
procedures was unacceptably low, gave an impetus to the development of practical 
methods for behavioral assessment in the clinical context. These methods attempt to 
eliminate inferences to internal constructs (Goldfried & Kent, 1972). There is now a 
substantial practical literature on behavioral observation in clinical work (e.g., Bellaclc & 
Hersen, 1988; Hayes et al., 1999; Haynes & Heiby, 2004; Haynes & O’Brien, 2000). 
Since, for behaviorists, research and practice are closely related, many of the procedures 
can equally well be applied in research. 

Psychotherapy process research began with the work of Carl Rogers and the client- 
centered group in the 1940s and 1950s. These were the first researchers to study 
recordings of actual therapeutic interactions, and the first to quantify aspects of the 
therapeutic relationship, such as therapist empathy (Kirschenbaum, 1979). 
Subsequent investigators have examined an enormous number of different process 
variables, ranging from global constructs, such as the quality of the therapeutic 
alliance, to specific types of responses used by the therapist and client (Greenberg & 
Pinsof, 1986). 

Developmental psychology researchers often employ observational methods, partly 
for the obvious reason that infants and very young children have insufficient language 
abilities to use self report methods, but also as a way of assessing aspects of parent- 
child interactions that cannot be obtained via self-report (e.g., parental sensitivity). 
The most famous example is the Strange Situation Test (Ainsworth et al., 1978), 
which involves briefly separating an infant from his or her parent and then observing 
their dyadic behavior when they are reunited. From these observations the child can 
be placed in one of four attachment categories (e.g., secure, anxious-ambivalent) with 
respect to the parent in question. 
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Content analysis arose out of mass media communication research, which uses such 
material as newspapers or transcriptions of broadcasts as its subject matter (Joffe & 
Yardley, 2004; Krippendorff, 2013; Neuendorf, 2002). For example, newspaper 
stories about mental illness might be content-analyzed according to the underlying 
etiological model they espoused. However, the raw material need not be restricted to 
the mass media. Content analysis can be used with self-report data, transcriptions of 
meetings, etc. For example, Fewtrell and Toms (1985) used content analysis to classify 
the discussion in psychiatric ward rounds into categories such as medical treatment, 
mental state, and social adjustment. Content analysis provides a useful means of 
bridging quantitative and qualitative approaches, in that it applies quantitative analysis 
to verbal (qualitative) descriptions (see Chapter 5). 


Procedures for Conducting Observations 

As we discussed in Chapter 6 in the context of self-report measures, it is usually better 
to use an existing measure than to attempt to develop your own. Measure development 
is time-consuming and difficult, and it is hard to publish work with unfamiliar 
measures. This is equally true in the context of quantitative observation; if at all pos¬ 
sible, it is better to use an existing coding manual and rating scheme with established 
inter-rater reliability. We discuss observational measures here from the viewpoint of 
the researcher developing a measure; this viewpoint is taken pardy in order to clarify 
the process involved in measure development, and partly to provide guidelines for 
measure development research. 

Operational Definitions 

The first step in quantitative observation is to operationally define the behavior to be 
observed. The goal is to specify the behavior sufficiently well, so that it can be observed 
with high inter-rater reliability. Often this means that the behavior should be defined 
so that it can be rated without the raters having to make large inferences, but for some 
variables this may not be possible. Giving clear definitions is harder than it seems, as 
even apparently simple behaviors such as head nods or eye contact, or giving advice in 
therapy, pose difficulties in delineation. More inferential constructs, such as the level 
of empathy offered by a therapist, are even more difficult to define. 

Developing a good definition is an incremental process. It is often useful to start 
with informal qualitative observation, supplemented by a review of the literature on 
the variable of interest and similar observational measures. The researcher then 
develops an initial version of the codes and tries them out on some data. This leads to 
revision of the codes and an iterative cycle of testing and revision. When the researcher 
has a coding system that he or she can use, the next step is to attempt to teach the 
codes to raters, who then test them out on data. This leads to a further cycle of testing 
and revision, which improves the likelihood that others besides the researcher will be 
able to use the measure (a form of inter-observer generalizability referred to as porta¬ 
bility) . Finally, the researcher utilizes the measure in a study, the results of which may 
suggest yet more revisions, and so on. 

Since many different dimensions of behavior can be examined, it is useful to have a 
framework to help guide one’s choices. Table 7.1 gives one such framework, adapted 
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. Sequential Phase : What is the temporal or functional orientation taken toward a unit of process (i.e., toward what happened before, during and after the unit)? 

a. Context (“antecedents”): What has led up to a unit of process (e.g., previous speaking turn, earlier relationships)? 

b. Process (“behaviors”): The process that is targeted for study at a given level (unit). 

c. Effects (“consequences”): The sequelae of a unit of process (e.g., reinforcement, treatment outcome). 
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from research on psychotherapy process (Elliott, 1991). Similar schemes could easily 
be constructed for other content areas of observation, e.g., children’s behavior in the 
classroom or family interaction. Which aspects of which dimensions are important 
depends pardy on the variables being observed and partly on the research questions. 

Methods of Observation 

Having specified the dimensions of the behavior to be observed, the next step is to 
choose an observational method. There are several choices (Bakeman & Quera, 2011; 
Cone, 1999; Haynes & O’Brien, 2000; Ostrov & Hart, 2013). 

• Narrative recording, that is, writing an account of what happens, is equivalent to 
qualitative observation. It is used in the behavioral observation and ecological psy¬ 
chology traditions (e.g., Bakeman & Quera, 2011; Barker, et al., 1978). It is useful 
for hypothesis generation, measure development, and for arriving at ideas about 
causal relationships (in behavioral terms, the antecedents, behaviors, and conse¬ 
quences). It is also good for low-frequency behaviors. However, it is difficult to 
assess the reliability of such observations. Narrative recording is often a preliminary 
step to developing more structured methods of observation. 

• Event recording yields the simplest form of frequency data. The observer counts 
every occurrence of the behavior within the entire observation period. For example, 
if the observation is focusing on therapist response modes used during a 50-minute 
therapy session, the final frequency count might be 35 questions, 32 reflections, five 
advisements, four interpretations, and one self-disclosure. The advantages of event 
recording are that it is simple and that it can be done alongside other activities; the 
disadvantages are that you cannot analyze sequences or other complexities and that 
it is hard to keep observer attention or assess observer reliability at the event level. 

• Interval recording divides the observation period into equal intervals (e.g., a 50- 
minute therapy session might be divided into 10 five-minute intervals) and the 
number of behaviors is recorded during each one. In whole interval sampling, the 
behavior is only recorded if it is present tor the whole of the interval, as opposed to 
partial interval sampling , when it can be present for any part of the interval. The 
advantages of interval recording are that it allows you to analyze sequences and 
that it gives a rudimentary estimate of both the frequency and the duration of a 
behavior. It may be adapted to record several behaviors concurrently. Having timed 
intervals also helps keep the observers alert. The disadvantages are that it requires 
more observer effort, as you have to attend to timing as well as to the behavior. 

• Time sampling. Observations are made only at specific moments of time, for 
example, every five minutes or every half-hour. When observing large groups, 
scan sampling can be used, where each member of the group is observed 
sequentially. For example, Hinshaw, Henker, Whalen, Erhart, and Dunnington 
(1989) used scan sampling to observe the social interaction of hyperactive 
boys. The advantages of time sampling are that it yields a direct measure of the 
prevalence of a behavior in a group and that it is good for high-rate, contin¬ 
uous behaviors; it also means that raters do not have to maintain continuous 
attention. The disadvantage is that low-frequency behaviors may be missed, as 
they might only occur between the observation times. 
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Sequential act coding records events in the order in which they occur. In contrast 
to event recording, it usually requires a comprehensive coding system to cover all 
possible events. (Event recording may just focus on one or two events, for 
example, specific aggressive acts in a school classroom.) To take a simplified 
example, researchers may classify events in a therapeutic interaction into client 
speech (C), therapist speech (T), and silence (S). Then a sequential act coding 
record might look like this: C,T,S,C,S,C. ... This strategy is ideal for sequential 
analysis, because it relies on natural units (such as talking turns), not artificial 
units (such as time segments). However, disagreements on where the units begin 
and end can complicate reliability, and the method is inefficient if you are not 
interested in sequences. 

Duration recording is similar to sequential act coding, except that the focus is 
on timing the occurrence of a single behavior rather than categorizing events 
into codes. You can measure both diiration, the interval between the start and 
the end of each behavior, and latency , the interval between behaviors. For 
example, Brock and Barker (1990) used this method to study the amount of 
“air time” taken up by each staff member during team meetings in a psychiatric 
day hospital. 

Global rating scales, in which the observer makes an overall judgment, often of the 
quality of the behavior, are usually based on a long period of observation. Clinical 
examples include the Brief Psychiatric Rating Scale (BPRS: Overall & Gorham, 
1962), which rates several dimensions of psychiatric symptomatology, and the 
Global Assessment Scale (GAS: Endicott et ah, 1976) which rates overall psychi¬ 
atric impairment. Global ratings, for example, of therapist treatment adherence or 
competence, have often been used in therapy process research, for example the 
Revised Cognitive Therapy Scale (CTS-R: Blackburn et al., 2001). Although they 
use behaviorally anchored rating scales, these are still less precise than the 
behavioral observation methods, in that the observer is being asked to quantify an 
impression or judgment. On the other hand, global ratings are useful for complex 
or inferred constructs and can provide useful summaries of events. Many global 
rating scales have acceptable reliability. 

Environmental measures. Finally, an interesting category of observation focuses 
on the psychological environment as a whole, rather than specific individuals 
within it. Procedures include behavioral mapping , where the observers record the 
pattern of activity in a given environment. For example, Kennedy, Fisher, and 
Pearson (1988) used behavioral mapping to study the patterns of patient and staff 
activity in a spinal cord injury unit over the course of a single day. 

Environmental observation may also involve the use of unobtrusive measures 
(Webb, Campbell, Schwartz, & Sechrest, 1966), in which features of the physical 
environment are used to yield data on patterns of activity. Classic examples of 
unobtrusive measures are using the wear and tear on a carpet as an index of the 
popularity of museum exhibits, and using the accretion of graffiti as an index of 
youth gang activity. Italian drugs researchers (Zuccato et al., 2005) have cleverly 
applied this idea in order to estimate the level of general population cocaine usage, 
by measuring the concentration of the drug’s urinary metabolites in the waters of 
the River Po. 
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Mechanics 

The mechanics of recording the observations need to be as simple as possible, so that 
the recording does not interfere with making the observations themselves. Possible aids 
include coding sheets, stopwatches, counters, and electromechanical devices, including 
computer software (e.g., ObsWin, ODLog, or The Observer XT: see Kahng & Iwata, 
1998). The observations may be conducted in real time, or the interactions may be 
audio or video recorded for subsequent observation and analysis. 

It is not always necessary or possible for the researchers to conduct the observations. An 
alternative is for participants to carry out the observations themselves, using self-monitoring 
methods (Korotitsch & Nelson-Gray, 1999; Piasecki, Hufford, Solhan, & Trull, 2007). 
For example, an evaluation of couples’ therapy might include the participants keeping 
written records of the number and type of arguments that they have over the course of 
several weeks. Self-monitoring can also be done by proxy: parents, for instance, could keep 
records of their child’s sleep problems. The advantage of self-monitoring is that it allows 
the researcher to obtain observational data over long time periods and also from private 
settings. It can also be adapted to monitor cognitions or feelings, using ecological momen¬ 
tary assessment procedures (Shiffman, Stone, & Hufford, 2008). For example, in a study 
of obsessive-compulsive disorder, participants might be asked to record their moment-by¬ 
moment intrusive thoughts in real time in their natural environment. 

If you have sequential data from your observations, for example, if you use interval 
or time-sampling methods, you can do more complex analyses of how the behaviors 
develop over time. This is a technical topic involving special statistics; Gottman and 
Roy (1990) describe some of the options. 

Reliability and Validity Issues 

An advantage of quantitative observation methods is that they facilitate the calculation 
of reliability (see Chapter 4 tor a discussion of the statistical aspects of assessing inter¬ 
rater reliability). One practical problem is observer drift , where observers start out with 
high reliability, but then tend to develop idiosyncratic rules or become careless as the 
observation proceeds. To prevent this occurring, it is important to continually monitor 
the observers’ reliability. 

The main validity issue, aside from problems with the operational definition of the 
variables, is the reactivity of observation. As we discussed above, in the context of qualitative 
observation, the act of observing may alter the behavior being observed. The only solu¬ 
tion is to make the observations as unobtrusive as possible, and to allow time for the 
people being observed to become habituated to the observers’ presence. This may be 
easier with qualitative observation, which is usually done to a more leisurely timetable. 

Practical Suggestions for Working with Raters 

Researchers have various strategies available for maximizing the reliability and validity 
of observer ratings. These include: 

• design or selection of measures with clear, well-defined variables and good examples 
of categories; 

• careful selection of raters; and 

• thorough training and management of raters. 
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Some suggestions on how best to work with raters are given below (see also 
Cone, 1999; Hartmann, Barrios, & Wood, 2004; Moras & Hill, 1991). Many of 
these considerations also apply to the use of multiple analysts in qualitative research 
(Hill, 2012). 

Rater selection. It is usually better to work with motivated volunteers, such as top 
students in advanced undergraduate seminars, who are interested in a career in clinical 
psychology. Occasionally you may have to drop a rater’s data later due to consistent 
unreliability, so it is best to start out with at least three and preferably four raters. 

Training. It is a good idea to begin with a didactic presentation and modeling of 
the rating process, followed by group rating. This is followed by extensive practice, 
including weekly feedback on progress and problems. The SPSS Reliability procedure 
offers useful analyses for this, providing solid evidence both of progress and of prob¬ 
lems, which can be shared with the group. For example, reliability checks will tell: 

• which categories or dimensions show reliability problems; 

• whether a reliability problem is general (spread across all raters) or specific 
(restricted to one or two raters); 

• if a rater has misunderstood a category; 

• if two raters have formed a clique which sets them apart from everyone else; and 

• if particular raters differ greatly in their base rates for a category. 

Training should continue until ratings on all variables reach an acceptable standard 
of reliability. The statistic used to assess reliability depends on the scale of measurement. 
For nominal scale data, Cohen’s kappa (which corrects for chance agreement) is used; 
for interval data, Pearson’s correlation can be used for evaluating the reliability of 
single raters compared to other raters (Balceman & Quera, 2011; see also Chapter 4). 
When ratings are to be combined across raters, Cronbach’s alpha or an intra-class cor¬ 
relation coefficient can be used to estimate the reliability of the combined ratings 
(Balceman & Quera, 2011; Shrout & Fleiss, 1979). 

Management. The management and nurturing of raters is at least as important as 
their selection and training. To foster the “research alliance,” communicate to the 
raters that their views will be taken seriously and encourage them to contribute to the 
refinement of the rating system. Regular meetings and feedback during the rating 
process help prevent alienation and produce more reliable and valid data. As far as 
practicable, raters should feel part of the whole research process, including the 
conceptual framework and the research questions (unless it is necessary to keep them 
blind to the hypotheses or questions), and analyses and interpretation; this may occa¬ 
sionally also include co-authorship, if important contributions are made to the study, 
as is the case with multiple qualitative analysts. 


CHAPTER SUMMARY 


This chapter has examined when and how observational methods of measurement 
might be used. The advantage of observation is that it provides a direct measure of 
behavior, thereby overcoming some of the validity problems of self-report that were 
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discussed in the previous chapter. It is useful when precise records of behavior are 
needed, or for studying behaviors that are not amenable to self-report (e.g., non¬ 
verbal behavior, or physiological responses). 

As in self-report, there are both qualitative and quantitative approaches to obser¬ 
vation. The main qualitative approach is participant observation, which derives from 
ethnographic approaches in anthropology. Text-based methods can also be consid¬ 
ered as observational in nature. 

Quantitative approaches use structured methods to give precise counts of behavior. 
They have been developed in four disparate areas—behavioral observation, therapy 
process research, developmental psychology, and content analysis—but the method¬ 
ological issues are similar in all applications. There are a number of different methods 
for conducting quantitative observation, for example, interval recording and time 
sampling. The choice between them depends on the nature of the research questions 
and the resources needed to make the observations. For all types of observation, 
careful selection, training, and monitoring of raters is important to achieve good 
reliability. 


FURTHER READING 

There is a good treatment of participant observation in several texts (e.g., Angrosino, 
2007; Emerson, 2001; Patton, 2002; Taylor & Bogdan, 1998). We recommend 
perusing some of the classic studies using this method, such as Goffman (1961) and 
Whyte (1943), as they are mostly stimulating and readable. The classic text on personal 
documents methods is Allport (1942); for further discussion of text-based methods, 
see Taylor and Bogdan (1998) and Potter and Wetherell (1987). 

Quantitative observational methods are reviewed by Bakeman and Quera (2011), 
Cone (1999), Haynes and O’Brien (2000), and Ostrov and Hart (2013). Greenberg and 
Pinsof (1986) review measures for use in psychotherapy process research, while Llewelyn 
and Hardy (2001) give a brief introduction to therapy process research in general. 

QUESTIONS FOR REFLECTION 

1. Why is there so litde research in psychology using observation as opposed to self- 
report? What might be gained by more balance between self-report and observa¬ 
tion research? What could be done to encourage more observation research? 

2. What would a participant observation study on your topic look like? What kind 
of setting would you need to enter? Who would you observe? What would you 
look for? What kinds of research questions would this allow you to answer? 

3. Think about doing text-based research on your topic: What kinds of texts might 
bear on your research topic? How could you obtain them? What could you find 
out from them? 

4. How would you go about turning your research topic into a behavioral observa¬ 
tion study? Would measure development be a necessary first step? If so, what 
constructs would you operationalize? What situations might be suitable for 
carrying out your observations? 
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KEY POINTS IN THIS CHAPTER 

• Design, in the sense we are using it here, refers to the logical structure of the study. 

• Nonexperimental designs are ones in which the researcher gathers data 
without making any active intervention. They can be classified into descrip¬ 
tive and correlational designs, according to the type of analysis conducted. 

• The golden rule is “correlation does not equal causation.” 

• There are several possible models of the causal relationships of two or more 
variables, including ones with mediator and moderator variables. 

• Campbell (e.g., Cook & Campbell, 1979) made two major contributions 
to the study of design: (1) the classification of validity types; and (2) the 
analysis of quasi-experimental designs. 

• Cook and Campbell’s four validity types apply to generalized causal infer¬ 
ence: they are statistical conclusion validity, construct validity, internal 
validity, and external validity. 

• Experimental designs are ones in which die researcher makes an active interven¬ 
tion or manipulation. They are classified into randomized and nonrandomized 
(quasi-experimental) designs, according to whether or not there is random 
assignment to experimental conditions. 

• There are several commonly used nonrandomized designs. Each has its asso¬ 
ciated validity threats. 

• Randomized designs facilitate inferences about causality. They are central to 
the discussion of evidence-based practice and empirically supported ther¬ 
apies. However, they also have limitations and validity threats, and cannot 
be regarded as a scientific panacea. 


Research Methods in Clinical Psychology: An Introduction for Students and Practitioners , 
Third Edition. Chris Barker, Nancy Pistrang, and Robert Elliott. 

© 2016 John Wiley & Sons, Ltd. Published 2016 by John Wiley & Sons, Ltd. 
Companion Website: www.wiley.com/go/barker 
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Previous chapters have covered the groundwork and measurement phases of the 
research process; the present one begins our examination of the design phase. This 
order, first measurement then design, roughly corresponds to how you go about 
planning an actual research project: you usually begin by thinking about which variables 
interest you and how to measure them; next you think about design. From now on, 
we will put measurement behind us, working, for didactic purposes, on the assumption 
that it is unproblematic, even though we recognize that this is often not the case. 

To clarify what we mean by the term “design,” think of the questions “what, when, 
where, and who?” about a research project. Measurement is the “what” aspect: what 
is being studied, what measurements are made. Design, in the sense we are using it 
here, denotes “when, where, and on whom” the measurements are taken: the logical 
structure that guides the data collection of the study. It covers such topics as the 
relative merits of large-sample versus single-case studies, what type of control group, 
if any, is required, and who the participants will be. 

The terms “research design” and “experimental design” are also sometimes used in 
a broader sense to denote everything to do with planning and executing a research 
project, synonymously with our use of the term “research methods.” The more 
restricted sense of the term “design” that we are using here is consistent with its use 
in the statistical literature. It is usually clear from the context whether the broader or 
narrower meaning is intended. 

Research designs can be classified into two fundamental types: experimental and 
nonexperimental. Experimentaldesignsinvolve an active intervention by the researcher, 
such as giving one type of therapy to some clients and a second type to others, whereas 
nonexperimental designs simply involve measurement, without changing the 
phenomenon or situation to be measured. These two approaches to design reflect 
“the two disciplines ofscientific psychology” (Cronbach, 1957,1975). Experimentalists 
are often more concerned with examining the causal influence of external factors, 
which are amenable to experimental manipulation; nonexperimentalists are often 
more concerned with variation between people. 

This chapter will examine nonexperimental and experimental designs, and their var¬ 
ious subtypes, and will also look at some general principles for assessing validity in 
research designs. 


NONEXPERIMENTAL DESIGNS 

Nonexperimental designs can be classified, according to their aims, into descriptive 
and correlational designs. As is obvious from their names, descriptive designs usually 
aim simply to describe, whereas correlational designs aim to examine associations in 
order to make predictions or explore causal linkages. 


Descriptive Designs 

Examples of descriptive studies frequently appear in the mass media: public opinion 
surveys, in which respondents are asked which political party they intend to vote for; 
the national census, which reports, for instance, the percentage of people living in 
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various types of accommodation; and national unemployment statistics. However, the 
importance of systematic descriptive research is generally overlooked by clinical psy¬ 
chologists, even though such research is often valuable as a preliminary step in under¬ 
standing a phenomenon of interest. Some examples of descriptive studies are: 

• Descriptive epidemiological research, which aims to document the incidence and 
prevalence of specified psychological problems. 

• Consumer satisfaction research, which assesses clients’ satisfaction with a 
psychological service. 

• Phenomenological research, which aims to understand the nature and defining 
features of a given type of experience. 

Quantitative descriptive studies report their results using descriptive statistics, naturally 
enough. This is a technical term covering such statistics as percentage, mean, median, 
incidence, and prevalence. However, it is rare to have a purely descriptive study, as 
researchers often want to examine the associations between two or more variables of 
interest. For example, in a consumer satisfaction study you may want to see whether 
there is an association between client satisfaction and various client demographic char¬ 
acteristics, such as gender or ethnicity. This leads on to the next type of study, the 
correlational design. 


Correlational Designs 

Correlational studies aim to examine the relationship between two or more variables: 
in technical language, to see whether they covary, correlate, or are associated with 
each other. Such studies are also called passive observation or naturalistic studies, in 
contrast to studies employing active methods of experimental manipulation. (Passive 
observation, as a research design, should not be confused with participant observa¬ 
tion, which is a data-gathering method.) In correlational studies, researchers measure 
a number of variables for each participant, with the aim of studying the associations 
among these variables. 

A well-known example of a correlational design is Brown and Harris’s (1978) study 
of the social origins of depression, which looked at the association between women’s 
depression, their experience of stressful life events, and vulnerability factors (such as 
low intimacy with the husband and loss of the mother before the age of 11). 
Correlational designs are often also used to examine individual differences, for 
example, in predicting which clients respond best to a psychological intervention. 
Examining such correlations is a common step in attempts to construct causal expla¬ 
nations. That is, one typically tries to predict what happens to whom (e.g., in therapy) 
in order to understand why it happens (e.g., what are the effective ingredients) or in 
order to improve an application (e.g., to learn how to enhance its outcome). 

Measure development research, which aims to develop, evaluate or improve mea¬ 
sures, uses both descriptive and correlational designs. As we discussed in Chapter 6, 
developing a new measure involves extensive testing of reliability and validity, using a 
correlational framework; it also involves providing normative data for the measure, 
using descriptive methods. 
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Correlational designs may be cross-sectional, in which all observations are made at 
the same point in time, or they may be longitudinal, in which measurements are made 
at two or more different time points. Correlational studies may use simple statistical 
measures of association, for instance, chi-square and correlation coefficients, or mul¬ 
tivariate methods, such as multiple regression, factor analysis, and log-linear proce¬ 
dures. They may also use more advanced methods, which aim to map the underlying 
structure of complex data sets. These go under various names - path analysis, latent 
structure analysis, causal modeling, or structural equation modeling (e.g., Hoyle & 
Smith, 1994; Tabachnik & Fidell, 2013; Tomarlcen & Waller, 2005) - but the under¬ 
lying logic is the same. They are used for evaluating how well conceptual models gen¬ 
erated from previous research or theory fit the data. 

Path analysis is both a method of conceptual analysis and a procedure for testing 
causal models. Its framework is a useful tool for planning out research, even if you 
never actually carry out a formal path analysis, in that it forces you to spell out your 
theoretical model. It is also useful for trying to conceptualize the results of correla¬ 
tional studies. The essence of path analysis is to tell a story in diagrammatic or flow¬ 
chart form, showing which variables influence which others: the examples of different 
kinds of causal linkages that are given in the next section are depicted in the form of 
elementary path diagrams. 

Correlation and Causation 

The major drawback of correlational studies is that they cannot be used to make 
unequivocal causal inferences. The golden rule of research design is: correlation does 
not equal causation. Correlations are necessary but not sufficient for establishing 
causality. They may strongly suggest causal influences, but cannot firmly establish 
them, although the absence of a correlation generally does rule out a causal relation¬ 
ship. You will see this rule frequently ignored in popular journalism and sometimes in 
the professional literature too. 

The existence and nature of causal relationships involves some difficult philosophical 
problems (Cook & Campbell, 1979; White, 1990). First, it is not clear exactly what 
psychologists mean when they use the words “cause” or “causality.” In the natural 
and biological sciences, causes are understood as mechanical or biochemical physical 
processes. In the 20th century, starting with Freud, psychologists extended the idea 
of causation to include mental mechanisms (Slife & Williams, 1995). Over the years, 
psychologists have used a wide variety of explanations to understand people, including 
intentions, developmental precedents, situational cues and opportunities, trait or 
diagnostic categories, unconscious meanings, biochemical processes, and genetic 
factors. Furthermore, there are various euphemisms for “cause” in psychology: the 
words “structural” (as in structural equation modeling) and “functional” (as in 
functional analysis) often stand in for “cause.” 

Second, in psychology and epidemiology we are often dealing with probabilistic 
rather than deterministic causes. Thus, when we say that smoking causes lung cancer 
or that poverty causes ill-health, we are not talking about certain causation (there are 
always exceptions) but about increased risk. Similarly, in clinical psychology, when we 
say that an intervention causes change, we mean that the intervention sets the condi¬ 
tions for likely client change, but that change is not certain. 
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Third, going back to Bacon, Hume, and J.S. Mill (see Shadish, Cook, & Campbell, 
2002; Haynes & O’Brien, 2000, Hill, 1965), philosophers and scientists have struggled 
with defining a set of conditions for inferring a causal relationship. Four generally 
agreed conditions are: 

• Covariation: the two variables must consistently occur together. The stronger the 
correlation, the more convincing is the causal inference. 

• Precedence: the hypothesized causal variable must reliably precede the effect variable. 

• Exclusion of alternative explanations: other explanations for the observed covaria¬ 
tion must be reasonably excluded. 

• Logical mechanism : there must be a plausible account for the hypothesized causal 
relation. 

Correlational studies can establish the first condition, that two variables, A and B, 
covary. Information relevant to the second condition, how they are ordered in time, 
may also be known. The third condition, that of eliminating alternative explanations, 
can be addressed to some extent with a correlational framework, although experi¬ 
mental designs are much more suited to this, while the fourth condition, providing a 
plausible account, can be derived from previous theory and research. (It is worth 
noting that this fourth condition has frequently been overlooked, and that many 
experimental designs are “causally empty” in that they shed no light on specific causal 
processes; see Elliott, 2002; Haynes & O’Brien, 2000.) 

Let us take a simplified example, derived from early formulations of client-centered 
theory (Rogers, 1957). Suppose that variable A represents therapist empathy and var¬ 
iable B represents the client’s outcome at the end of therapy, and that research has 
established a significant positive correlation between therapist empathy and client 
outcome. Then a number of inferences about their causal relationships are possible, 
some of which are depicted in the following simple path diagrams (in which an arrow 
indicates the direction of a causal relationship): 

1. A-► B 

First it may be that A causes B: higher therapist empathy brings about better client 
outcomes. 

2. B-►A 

On the other hand, it is also possible that B causes A: clients who are improving in 
therapy may tend to draw more empathic responses from their therapists. 

3. A B 



C 


A very common situation is that A and B may both be caused by a third variable, C, such 
as client psychological mindedness. It is plausible that clients who are more psycholog¬ 
ically minded could have better outcomes and also generate more empathy in their 
therapists. Thus the apparent causal relationship between A and B might be spurious: 
that is, entirely explained by the influence of the third variable, C. The presence of such 
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third variables which provide competing causal explanations prevents the researcher 
from drawing accurate causal inferences and thus reduces the study’s validity. 

4. A-► D-►B 

Yet another possibility is that A does not influence B directiy, but only indirectiy via D. 
Variables such as D are known as mediator variables (Baron & Kenny, 1986; Frazier, Tix, 
& Barron, 2004; Kazdin, 2007): they mediate (come in the middle of) the relationship 
between two other variables. For example, higher therapist empathy could lead specifi¬ 
cally to increased client self-exploration, which could then lead to better client outcome. 
We would then say that the causal relationship between empathy and outcome was medi¬ 
ated by client self-exploration. The research enterprise is often characterized by the search 
for mediating variables: researchers do not simply let it rest when they observe a correla¬ 
tion, rather they attempt to understand the links in the chain between the two variables. 

5. A-► B 

t 

E 

A fifth possibility is where the relationship between A and B differs according to the 
values of E. Variables such as E are known as moderator variables (Baron & Kenny, 
1986; Frazier et al., 2004): they moderate the relationship between two other vari¬ 
ables, acting like a gate or valve or volume control. E could represent the type of client 
presenting problem, for example, anxiety or depression, or a demographic variable 
such as age or gender. If, for example, therapist empathy led to better client outcomes 
in men but not in women, we would say that the causal relationship between empathy 
and outcome was moderated by client gender. 

6. The five possible relationships we have reviewed all assume that the obtained corre¬ 
lation reflects some kind of causal relationship. There is, however, a sixth possibility, which 
is often overlooked, that the correlation may reflect a conceptual confound between A and 
B. In other words, it may be due to overlapping meaning between the two variables: that 
they are two aspects of the same construct. Thus, therapist empathy (particularly when 
rated by the client) may actually be a kind of outcome, which means that the correlation 
may simply reflect a correlation of one kind of outcome with another kind of outcome. 

It follows from this that the art of research design is to collect data in such a way as to 
examine the influence of third variables and of mediator and moderator variables, and 
to evaluate for conceptual confounding, in order to be able to draw clear inferences 
about the relationships between the variables under study. Experimental designs can 
help to do this by systematically manipulating one or more variables at a time. 


EXPERIMENTAL DESIGNS 

The word “experiment” has the same root as the words “experience” and “peril”: all 
derive from the Latin for try or test, with the connotation of risk or danger. To carry 
out an experiment is to put your hypotheses in danger of failing or being falsified (cf. 
Popper, 1963: see Chapter 2). Psychologists today usually think of an experiment as 
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a study involving random assignment to two or more groups or conditions. However, 
this view is a recent development (within the past 60 years or so) and represents a 
methodological narrowing under the influence of positivism and biomedical research. 
Before that time, and today in its ordinary usage, an experiment referred to “the 
action of trying anything, or putting it to proof; a test, trial” (Oxford English 
Dictionary). 

An experimenter interferes with the natural course of events, in order to construct 
a situation in which competing theories can be tested. These theories often concern 
causal influences between variables. In physics, formal experimentation began with 
Galileo, who attempted to test his theories of dynamics by rolling balls down inclined 
planes, initiating what we now call the hypothetico-deductive method. Before Galileo, 
science had relied on drawing generalizations from passive observations and from 
informal trial and error experimentation. 

Experimental designs are of particular interest to clinical psychologists because 
therapeutic work itself can be thought of as an experimental intervention. In col¬ 
laboration with the client, the therapist considers a problematic situation in the 
client’s life, forms a hypothesis about what is causing it and what might be done to 
improve it, helps the client try to change something about it, and then observes 
the results. Here the tentative connotation of experiment is apt: if the intervention 
does not work, client and therapist then repeat the cycle by reformulating the 
problem, trying something else and once more observing the results. This experi¬ 
mental approach to therapeutic work lies at the core of the applied scientist model 
(see Chapter 2). 

Most psychotherapy outcome studies use an experimental design. This chapter will use 
a comparative outcome trial by Dimidjian et al. (2006) as a running example (see box). 

Terminology 


Example of a clinical trial: Dimidjian et al. (2006) 

Participants (N = 241) met DSM criteria for major depressive disorder. They 
were randomly allocated into one of four treatment conditions: (1) behavioral 
activation (2) cognitive therapy (3) anti-depressant medication, and (4) medica¬ 
tion placebo. The main outcome measures, the Beck Depression Inventory 
(Beck, Steer, & Brown, 1996) and the Hamilton Rating Scale for Depression 
(Hamilton, 1960), were completed at baseline and mid- and post-treatment 
(approximately weeks 0, 8, and 16). (The results showed that, for severely 
depressed clients, both behavioral activation and anti-depressant medication 
were superior to cognitive therapy, but for the less severely depressed patients, 
there were no differences between treatments.) 


There is a considerable amount of terminology in the experimental design area. The 
treatment or intervention that is varied by the experimenter is known as the independent 
variable or the experimental condition or the experimental manipulation ; the measure 




144 


Foundations of Design 


of the independent variable’s effect is known as the dependent or outcome variable. In 
the Dimidjian et al. (2006) study the independent variable, or experimental condition, 
was the type of therapy: behavioral activation, cognitive therapy, anti-depressant med¬ 
ication, or placebo. (Note that the term “group” is often used to denote one of the 
experimental conditions. It is potentially confusing in clinical applications, as it 
wrongly suggests that group rather than individual therapy was used.) The dependent 
variables were the two depression measures. 

Frequently, in addition to one or more of the groups in the design being subjected 
to an experimental intervention, another group provides a control group , so called 
because it is used to rule out or control lor the influence of one or more third vari¬ 
ables. For example, the medication placebo group in the Dimidjian et al. (2006) study 
was used to control for the instillation of hope of successful outcome and also for the 
potential therapeutic benefits of the assessment interviews. It is important to keep in 
mind that some experiments (“quasi-experiments,” see below) may use only one 
treatment condition. In the treatment literature, such one-group experiments are 
referred to as open clinical trials or uncontrolled trials , often used to provide an initial 
test of a promising treatment, prior to pitting it against a control group or other 
treatment (a randomized clinical trial). 

Many types of experimental design are covered in the statistical literature (e.g., 
Field, 2013; Howell, 2010; Shadish et al., 2002; Winer, Brown, & Michels, 1991), 
such as factorial designs, repeated-measures designs and Latin squares. Here we will 
look at some of the simpler ones. The statistical method used in experimental studies 
is usually the analysis of variance (ANOVA), or related methods such as the t-test, 
multivariate analysis of variance (MAN OVA), or the analysis of covariance (AN COVA). 
However, before examining some specific types of designs, it is helpful to consider 
some general principles of validity, in order to provide a framework for thinking about 
the strengths and weaknesses of any given design. 


Cook and Campbell’s Validity Analysis 

The work of Campbell and his collaborators (Campbell & Stanley, 1966; Cook & 
Campbell, 1979; Shadish et al., 2002) has been enormously influential: Cook and 
Campbell’s (1979) book or its successor, Shadish et al. (2002), is required reading for 
all serious applied psychology researchers. Campbell’s ideas were developed specifi¬ 
cally in the context of designs that attempt to infer causality, which is why we are 
addressing them here under the heading of experimental designs. However, the 
concepts can be applied to all designs, descriptive and correlational as well as experi¬ 
mental. They are invaluable both for planning one’s own research and for evaluating 
other people’s. 

The central thrust is an analysis of different types of design validity for generalized 
causal inference. We are now using the concept of validity in a broader sense than in 
Chapter 4, when we discussed the reliability and validity of measures; here we will be 
talking about the validity of the conclusions that you can draw from the study as a 
whole. Campbell and Stanley (1966) introduced the fundamental distinction between 
internal and external validity. Internal validity refers to the degree to which causality 
can be inferred from a study: that is, in the language of experimentation, is the 
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independent variable producing the changes in the dependent variable? (It is not to 
be confused with internal consistency , which is a type of reliability.) External validity 
refers to the degree to which the results of the study may be generalized over time, 
settings or persons to other situations, for instance, whether the patients studied in a 
clinical trial are representative of the patients seen in other settings. A related concept 
is ecological validity , which assesses how artificial the measures and procedures of the 
study are, compared to the world outside of the research setting. 

Cook and Campbell’s (1979) expanded treatment considered statistical conclusion 
validity in addition to internal validity (both of which concern interpreting covaria¬ 
tion) and construct validity in addition to external validity (both of which concern the 
generalizability of the study - construct validity can be seen as assessing generaliz- 
ability to the underlying construct which the different measures attempt to tap). In 
this chapter, we shall principally look at internal validity. Construct validity, which 
derives from the work of Cronbach and Meehl (1955), was introduced in Chapter 4. 
External validity is covered in Chapters 10 and 12, and statistical conclusion validity, 
which concerns the appropriateness of the statistical methods, is also covered in 
Chapter 12. 

Each of the four validity types can be defined by a key question (see Table 8.1); 
these questions are asked sequentially about any study that claims to examine causal 
relationships. The first one asks “Is there an effect there at all?” (statistical conclusion 
validity). If the answer to this is affirmative, the next question is “Is the effect causal?” 
(internal validity). Then you ask “What does it mean (both in terms of the outcome 
variables and the experimental manipulation)?” (construct validity). Finally, “Does it 
generalize?” (external validity). 

The central dilemma in designing a study is that there is often a trade-off between 
internal and external validity. It is possible to achieve high internal validity in a labo¬ 
ratory, where the researcher can exert considerable control. A common criticism of 
social psychology experiments of the 1960s and 1970s was that, although they had 
achieved high internal validity by being conducted in a controlled laboratory setting 
with a homogeneous population (often young white male US undergraduates), they 
had in so doing sacrificed their external validity, that is, their generalizability or real- 
world relevance (Sears, 1986). In a nutshell, the designs were clever but artificial 
(Armistead, 1974; McGuire, 1973). The same criticism also applies to early analog 
studies of behavior therapy, which were conducted in artificial laboratory conditions 
with volunteer clients on specific phobias such as spider or public speaking fears 
(Shapiro & Shapiro, 1983). Conversely, field research, which is conducted in natural 
settings with clinical populations, usually has high external validity. Unfortunately, 


Table 8.1 Four validity types (Cook & Campbell, 1979) 


Validity type 

Defining question 

Statistical conclusion 

Is there an effect? 

Internal 

Is it causal? 

Construct 

What does it mean? 

External 

Does it generalize? 
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this is often at the expense of lower internal validity, since experimental control 
is much more difficult to obtain in field settings, for a variety of reasons which we 
discuss below. 

The main thrust of Cook and Campbell’s work is that all designs are imperfect, but 
that it is possible to analyze systematically the potential nature and consequences of 
their imperfections, which are known as threats to validity. The researcher’s task is to 
try to achieve an optimal design given the aims and constraints of the research. To 
quote one prominent psychotherapy researcher: “it is therefore impossible to design 
the perfect study. The art of outcome research design thus becomes one of creative 
compromise based upon explicit understanding of the implications of the choices 
made” (Shapiro, 1996: 202). Cook and Campbell’s framework is an indispensable 
tool for thinking about the consequences of such compromises. 

Cook and Campbell’s Classification of Research Designs 

In addition to analyzing validity issues, Campbell and his collaborators (Campbell & 
Stanley, 1966; Cook & Campbell, 1979; Shadish et al., 2002) also proposed a 
taxonomy of experimental designs. They introduced the fundamental distinction bet¬ 
ween quasi- experimental and experimental designs. Quasi-experiments are defined as 
“experiments that have treatments, outcome measures, and experimental units, but 
do not use random assignment to create the comparisons from which treatment- 
caused change is inferred” (Cook & Campbell, 1979: 6). However, in the light of our 
earlier discussion about the term “experiment” being too narrowly defined within 
psychology, it seems preferable to use the more precise terms nonrandomized and 
randomized designs instead of quasi-experiment and experiment. Cook and Campbell 
give an extensive listing of possible nonrandomized and randomized experimental 
designs. Here we will consider the most commonly used ones as illustrative 
examples. 


Nonrandomized Designs 
One-group Posttest-only Design 

This rudimentary design can be depicted in the following diagram: 

X O 

In Cook and Campbell’s notation, X stands for an experimental treatment, that is, 
something done to the participants, such as a clinical intervention. (The notation can 
also be extended to cases where X is not an experimental treatment, but rather some 
other event that occurs to the participants, such as a disease or a disaster.) O stands for 
an observation or measurement, of one or of several variables. 

The one-group posttest-only design, originally labeled the one-shot case study by 
Campbell and Stanley (1966), is the simplest possible design. It is characterized as a 
quasi-experimental design because of the experimental intervention, X, although it 
can also be conceptualized as a type of descriptive study. One common application is 
in consumer satisfaction studies, in which clients are surveyed during or after a 
psychological intervention to find out how they felt about it. This design is useful for 
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generating hypotheses about causation, but it is almost always insufficient for making 
causal inferences, because doing so invites the logical fallacy referred to as post hoc ergo 
propter hoc (because B happens after A, B must result from A). 

However, as Cook and Campbell (1979) note, this design should not be dismissed 
out of hand in research aimed at testing causal explanations. It can be rescued if 
enough contextual information is available, and especially if one takes a detective 
work approach, looking for signs or clues about causality - what Cook and Campbell 
(1979) call “signed causes.” In the clinical context, such signs might include post¬ 
treatment ratings of perceived change or retrospectively completed estimates of pre¬ 
treatment levels of functioning. (The detective metaphor is appropriate here, since 
Sherlock Holmes and his successors based their causal inferences - concerning who¬ 
dunnit - upon post-hoc data.) Shadish et al. (2002) note that the interpretability of 
this design is enhanced when the effect is clear and measured in multiple ways; this 
allows for a pattern-matching approach that compares effects to potential causes, as 
an epidemiologist works backwards from a disease outbreak to possible causes. 

One-group Pretest-Posttest Design 
O, X o 2 

This design extends the previous one by adding a pre-measure, which then allows a 
direct estimate of change over time. It is often known as an open clinical trial or an 
uncontrolled trial. For example, this design may be used in evaluating the outcome of 
a clinical service. The psychologist might administer a measure of problem severity, 
such as the Generalized Anxiety Disorder scale (GAD-7: Kroenke et al., 2007), to all 
clients before and after therapy. 

However, it is not immediately possible to attribute change in the outcome variables 
to the experimental treatment, X. This is because, as in the previous design, it is risky 
to infer post hoc ergo propter hoc. For example, a newspaper headline in a feature on 
mental health stated: “Despite being the target of suspicion, the evidence that antide¬ 
pressants work is indisputable: more than two-thirds of people taking them recover” 
(London Observer, 12 January 1992). The inference appears to be that, since taking 
antidepressants is associated with a good chance of recovery, therefore they must cause 
that recovery. (To see the logical fallacy more clearly, try substituting taking antidepres¬ 
sants with some less obviously psychotherapeutic activity, such as watching television.) 
In addition, the implication that the antidepressants cause recovery is further called 
into question since the recovery rate of depressed people who did not take antidepres¬ 
sants is not supplied—perhaps two-thirds of them recover also. The availability of such 
data would result in a nonequivalent groups pretest-posttest design, described below. 

Cook and Campbell (1979) provide a checklist of possible threats to internal 
validity in this design. For researching the effects of psychological interventions, the 
most important ones are: 

• Endogenous change, which refers to any kind of change within the person. The 
most important instance is spontaneous recovery , also called spontaneous remission , 
which means recovery occurring with no apparent external reason. 
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• Maturational trends refer to the growth or maturation of the person. This is a 
special case of endogenous change. It is, of course, especially relevant to research 
with children, who may often “grow out of” their psychological problems. 

• Reactivity of measurement , where the act of making a measurement changes the 
thing being measured. For example, there may be practice effects on a psychological 
test, where participants perform better on a second administration of a test because 
they have learned how to respond. As another example, there is evidence that par¬ 
ticipants who are interviewed as part of their initial assessment may experience a 
clinical benefit (Svartberg, Seltzer, Choi, & Stiles, 2001). 

• Secular drift , that is, long-term social trends, taking place over a time scale of 
years. These are relevant if you are doing large-scale longitudinal research on 
public health interventions, such as on smoking or on safe-sex behaviors, where it 
is important to take such trends in the target outcome variable into account when 
looking at the impact of the intervention. 

• Interfering events , that is, significant events other than the experimental interven¬ 
tion that occur between the pretest and the posttest. For example, fiscal changes 
such as increases in tobacco or alcohol taxes may reduce consumption; or an 
international crisis may increase general anxiety. 

• Regression to the mean. Participants in clinical studies are often selected on the 
basis of their extreme scores on a measure of psychological distress. Clients for a 
therapy outcome study, for example, might be selected (or select themselves) on 
the basis of high scores on an anxiety scale. The regression to the mean 
phenomenon is due to unreliability of measurement, which means that scores on 
the posttest will tend to show that clients have improved, even if the therapy was 
ineffective. This is because the extreme scores at the pretest will have partly 
reflected random measurement errors, which will tend not to be as extreme at the 
posttest. 

One further problem in interpreting the findings from this and other experimental 
designs has to do with the construct validity of the experimental intervention (Cook & 
Campbell, 1979), in other words, what the experimental intervention actually repre¬ 
sents. The distinction between internal validity and the construct validity of the exper¬ 
imental intervention is sometimes hard to grasp. The question of internal validity asks 
whether change can be attributed to the intervention, X, or to something else; whereas 
the question of the construct validity of the experimental intervention accepts that X 
is producing the change and asks what about it (what construct) actually accounts for 
the change? 

Some possible construct validity problems are: 

• Confounding variables. “Confounding” means occurring at the same time as, and 
thus inextricably bound up with. In the Dimidjian et al. (2006) study described 
earlier, the type of therapy was confounded with the person of the therapist, since 
each of the two psychological interventions (behavior activation and cognitive 
therapy) was delivered by a different small set of therapists. It is possible that the 
differences between the therapies could simply have reflected differences in the 
personality or skill of the therapists who delivered them. 
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• Expectancy effects. Clients may benefit from a service simply because they expect 
to, rather than as a direct result of what the service actually delivers. This expectancy 
effect is related to the placebo effect in drug studies, where patients may benefit 
from pharmacologically inert treatments. Related phenomena are that of the 
demand characteristics of the study (Orne, 1962), where the participants attempt 
to understand what is expected of them in the social situation of the experiment, 
and the more general issue of experimenter effects (Rosnow & Rosenthal, 1997), 
by which is meant any influence that the researcher may exert, often uncon¬ 
sciously, on participants’ behavior. 

• Hawthorne effect, in which the research itself produces beneficial change. This 
effect takes its name from a famous study in occupational psychology, in which 
increasing the level of illumination in a factory was found to increase industrial 
output, but so also was decreasing the level of illumination (see Adair, 1984; 
Rosnow & Rosenthal, 1997). 

The difference between 0 : and 0 2 (i.e., the total pre-post change) is sometimes 
called th c gross effect of the intervention (Rossi, Lipsey, & Freeman, 2004). The net 
effect is defined as the effect that can reasonably be attributed to the intervention 
itself, that is, the gross effect minus the effect due to confounding variables and error. 
In clinical research it is often a good first step to use a simple design such as the one- 
group pretest-posttest to demonstrate that a gross effect exists at all. Subsequent 
studies can then use more sophisticated designs with control or comparison groups to 
estimate the net effects, rule out the effects of possible confounding variables and 
examine which components of the intervention are actually responsible for client 
improvement. 

Schmidt and Hunter (2015) argue that Cook and Campbell’s (1979) list of validity 
threats has been widely misinterpreted as indicating that one-group pretest-posttest 
designs are automatically fatally flawed. Rather, researchers should be aware of, and 
attempt to mitigate the effect of, potential validity threats. If this is done, such designs 
will frequently allow causal inference of treatment efficacy. 

Nonequivalent Groups Posttest-only Design 

NR X O 
NR O 

In the notation, NR stands for a nonrandomized assignment to experimental 
groups. 

This is like the one-group posttest-only design, except that the group receiving the 
experimental treatment is compared to another similar group that did not receive the 
treatment. For example, an intervention could be instituted on one hospital ward and 
another ward be used as a comparison group. Another variation is to use the design 
to compare several different active treatments, for example, three different treatment 
regimes on three hospital wards. 

Unlike the previous and the following design, this one provides no direct estimate 
of pre-post change. It can be used for retrospective studies, where there is no 
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pre-measure, and post-measures are all that one can manage in the circumstances. 
(This design can also be regarded as correlational, since what is being studied is the 
association between the group membership variable and the outcome variable.) In 
clinical applications, X usually represents an intervention. However, the conceptual 
framework of the design can also be used for comparing groups who differ in having 
experienced some life stressor or other risk factor (e.g., child maltreatment, passive 
smoking, or trauma), which it would be clearly unethical to experimentally manipu¬ 
late. The consequences of this stressor or experience can then be evaluated and causal 
hypotheses generated. Similarly, some epidemiological case-control studies can also 
be considered under this category, where X would represent having some illness or 
predisposition to illness, for example, being HIV positive. 

When used in this way, particularly to study the impact of negative events, this 
design is often known as a natural experiment (Rutter, 2007; Shadish et al., 2002). 
One example is the Romanian orphans study (Rreppner et al., 2007), which fol¬ 
lowed up children who had suffered severe emotional and physical deprivation in 
orphanages in 1980s Romania and then had subsequently been adopted in the 
United Kingdom. By analyzing at what age they had been adopted, it was possible 
to test hypotheses about critical developmental periods for developing psychological 
problems as a result of maltreatment. There appeared to be a threshold of about 6 
months old: if the deprivation had ended before that, its consequences were less 
severe, but if it had continued beyond that, the consequences were more 
pronounced. 

The major threat to internal validity in this design (and equally in the following 
design) is uncontrolled selection. That is, since the assignment to groups is not random, 
one cannot assume that the two groups were the same before the treatment, X. 
Participants in the different groups may differ systematically on, for example, motiva¬ 
tion, problem severity, or demographic characteristics. Even if the researcher is able to 
compare the groups on these variables, there still may be other important systematic 
differences that are not tested for. Beyond this, Shadish et al. (2002) provide an 
extensive list of threats to the validity of such case-control designs. 


Example of a nonequivalent groups posttest-only design: 
the consumer reports study 

Seligman (1995) reported on the results of a large survey, conducted by the 
U.S. consumer magazine, Consumer Reports , of people who had received 
psychological therapy. Respondents were asked about the kind of help they had 
received and their evaluation of it. The large sample made it possible to compare 
different groups within the design: for example, those that had been treated in 
different modalities of therapy or by different types of professionals. It can, 
therefore, be regarded in part as a nonequivalent groups posttest-only design. 
Generally patients reported substantial improvement, and there were no differ¬ 
ences between modalities or types of therapist, but there was a positive effect for 
length of therapy. 
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Nonequivalent Groups Pretest-posttest Design 

NR O X O 

NR O (Y) O 

This commonly used design combines the features of the previous two, and helps 
to rule out some of the associated internal validity threats. The group in the lower 
part of the diagram that does not receive the experimental intervention is called a 
control group. Several different types of control groups are possible, ranging from an 
alternative comparison treatment (Y in the diagram) to no treatment at all (in which 
case no letter is used in the diagram). The type of control group depends on the 
research question: whether you are trying to show that the experimental treatment is 
as good as or better than an established treatment, or simply better than nothing at 
all. We will discuss these issues more fully below under the heading of the randomized 
groups pretest-posttest design. 

This design can easily be extended to encompass two or more experimental or con¬ 
trol groups. An example is the Stanford Three Community study (Farquhar et al., 
1977), which studied the effects of the mass media and community interventions on 
the prevention of heart disease. It was conducted in three small towns in California. 
One town received only the pre-post measurement, another a sustained mass-media 
campaign, and a third mass-media plus community intervention in the form of face- 
to-face instruction. The results showed encouraging effects for the mass-media 
condition, which were augmented in the community support condition. 

Another large-scale example is Stiles, Barlcham, Mellor-Clarlc, and Connell’s (2008) 
study of the outcome of three different therapeutic orientations delivered in a natu¬ 
ralistic setting in the U.K. National Health Service. The sample consisted of 5613 
patients who received either CBT, person-centered therapy or psychodynamic therapy 
in National Health Service primary care settings. Clients in all types of therapy had 
virtually identical pre-test values, showed large amounts of pre-post change, and 
outcomes across the three different therapies were approximately equivalent. Like all 
naturalistic studies, this had its shortcomings (see Clark, Fairburn, & Wessely, 2008, 
for a methodological critique, and Stiles, 2008, for the authors’ rebuttal). 

Prospective case-control studies, where a cohort of individuals is studied longitudi¬ 
nally to evaluate the impact of a disease or of stressful life events, can also be considered 
under this category. Because participants are being studied prospectively, then mea¬ 
sures are obtained before the event of interest. For example, a cohort of elderly people 
might be studied at one-year intervals to examine the psychological impact of bereave¬ 
ment by comparing the individuals who become bereaved with those who do not. 

As in the previous design, the major threat to internal validity is uncontrolled selec¬ 
tion: that the groups may differ systematically in ways other than the presence or 
absence of the experimental treatment or event of interest. Sometimes experimenters 
try to compensate for differences in the groups by statistical methods, for example, 
analysis of covariance or multiple regression (Shadish et al., 2002). For example, if the 
experimental group turns out to be younger than the control group, age might be 
used as a covariate in the analysis (often described as partialling out the effects of the 
covariate, in this case age). This can be misleading when the nonequivalent groups are 
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drawn from two different populations (Miller & Chapman, 2001): it is like trying to 
equate an elephant and a mouse by adjusting for their relative weights. However, 
when the groups are drawn from the same population, such analyses may be per¬ 
formed in nonrandomized designs. Another approach is to construct a control group 
by matching participants in each group on key variables, such as age, gender, and 
problem severity (Rossi et al., 2004). However, this is difficult to accomplish and 
again can be misleading if the groups represent two different populations. 

The interpretability of this and other nonrandomized experimental designs can be 
enhanced by adding pretests and later assessments to examine the process of change 
more closely and also by adding specific control groups to deal with specific internal 
validity threats. 

Interrupted Time-Series Design 

O, o 2 ... o 20 x o 21 ... o 40 

This design extends the one-group pretest-posttest design to cover multiple 
measures over time. It has a different rationale to the previous designs, in that it 
attempts to pinpoint causal influences by looking at discontinuities between a series 
of baseline measures and a series of follow-up measures (we have arbitrarily depicted 
20 baseline and 20 follow-up points; in practice there may be considerably more). 
It is good for studying naturally occurring chronological data and can be used to 
analyze existing data from large samples, for instance, to look at changes in national 
alcohol or tobacco consumption following taxation changes, or reductions in 
injuries following legislation on car seat belts legislation (Guerin & MacKinnon, 
1985). It can also be used in single-case designs, in which participants serve as their 
own controls (see Chapter 9). The major threat to internal validity is the presence 
of interfering events—other things occurring at the same time as the treatment, X, 
that affect the outcome variable—but this can be dealt with by careful monitoring 
for such events. 


Randomized Designs 

The essential feature of randomized experimental designs (as opposed to nonrandom¬ 
ized or quasi-experimental designs) is the random assignment of participants to 
experimental groups or conditions. The rationale is that the groups will therefore 
then be equivalent except for the experimental manipulation: in other words, 
randomization eliminates the likelihood of selection bias as a threat to internal validity. 
It also allows the use of the statistical theory of error. Note that randomization 
(random assignment to experimental conditions) is not to be confused with random 
sa mplingig method of unbiased selection of participants for the study: see Chapter 10). 

Randomized experimental designs enable the experimenter to manipulate a single 
variable at a time, and thus any relationships established between the independent and 
dependent variables are more likely to fulfill the first three conditions for inferring 
causality discussed earlier in this chapter (covariation, precedence, and exclusion of 
alternative explanations). The judicious choice of control and comparison groups (see 
below) allows the effects of key variables to be isolated. 
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Randomized designs are regarded by influential bodies (e.g., the Cochrane 
Collaboration, NICE, York Centre for Reviews) as the “gold standard” research 
design for evaluating treatments. They are the standard design used in medicine to 
evaluate new drugs or other medical or surgical treatments. They are known as 
randomized controlled (or sometimes clinical) trials , abbreviated to RCTs in either 
case. In clinical psychology, RCTs are often known as efficacy studies, as opposed to 
effectiveness studies, which are uncontrolled studies conducted in field settings. There 
is considerable debate about the relative place of each type of research (see, e.g., 
Seligman, 1995), which essentially boils down to how much weight one gives to 
internal validity at the expense of other validity types, especially external validity. (We 
will address these issues further below, and also in Chapter 11.) 

The theory of experimental design was developed by the statistician Fisher in the 
1920s. His early work was mosdy done in agriculture, looking at how crop yields were 
affected by different fertilizers or different varieties of grain. This agricultural origin 
accounts for some of the terminology which is still used to describe different experi¬ 
ments (e.g., split plots or randomized blocks refer to parts of fields); it also provides 
another area of application in which to picture specific designs. Here we will tend to 
focus on designs in therapy outcome research; it is one in which a lot of work has been 
done, and it is central to discussion of evidence-based practice and empirically sup¬ 
ported therapies. The statistical textbooks (e.g., Field, 2013; Howell, 2010; Shadish 
et al., 2002; Winer et al., 1991) cover many different experimental designs. We will 
illustrate the issues in the context of the paradigmatic design in clinical research, the 
randomized groups pretest-posttest design: 

R O X O 

R O (Y) O 

This design, or its close relatives, is the standard design for randomized controlled 
trials in medicine, in which a new therapy or drug is tested against a pill placebo or a 
no-treatment control. 

In the notation, R denotes a randomized assignment of participants to experimental 
conditions. Such assignment needs to be done without bias, in order to ensure that 
each participant has an equal chance of being in each condition. This may be done in 
several ways, for instance, by using random number tables, a web-based program (e.g. 
http://www.randomizer.org), or an independent statistician. On the other hand, 
nonrandom or pseudo-random methods of allocation, for example, by assigning the 
first 10 participants to the experimental group and the next 10 participants to the 
control group, may introduce systematic error (Cook & Campbell, 1979). 

The independent variable, that is, whether or not the participants receive the exper¬ 
imental intervention, is known as a betrveen-groups factor (since it divides the partici¬ 
pants into groups). The design depicted above is said to have a between-groups factor 
with two levels (i.e., the experimental group and the control group). This basic design 
may be extended in several ways, such as: 

More than Wo levels. There can be more than two levels of the between-groups 
factor, that is, there may be more than one experimental group or more than one con¬ 
trol group. In our running example, the type of intervention factor in the Dimidjian 
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et al. (2006) study had four levels: behavioral activation, cognitive therapy, 
antidepressant medication, and placebo. 

Multi-factorial designs , in which there is more than one between-groups factor. In 
our running example, the Dimidjian et al. (2006) study had only one between-groups 
factor, the type of intervention, but it could have possibly built in a second one, such 
as a two-level length of intervention factor (e.g., brief treatment versus medium-term 
treatment). 

The pretest-posttest design is an example of a repeated-measures design , that is, one 
in which the same individuals are assessed at two or more points in time. Additional 
levels of the repeated-measures factor may be introduced, for example, there may be 
a follow-up assessment six months or a year after the intervention has ended. 

Blocking factors are ones that represent participant individual difference variables 
(e.g., type of presenting problem) within the overall research design (this is also 
referred to as stratification). Such factors are included in order to examine their effect 
as potential moderator variables or in order to ensure that the experimental groups are 
balanced on crucial variables. The procedure is that participants are grouped into the 
relevant categories before the randomization to experimental treatments takes place. 
For example, the Dimidjian et al. (2006) study had a blocking factor of the client’s 
initial level of depression (moderate or severe). The researchers first allocated potential 
clients to the appropriate severity level and then randomly assigned people from the 
same cell to the therapists within each of the four experimental treatment conditions. 

In the educational context, analyses addressing the question of which interventions 
work best for which students are referred to as aptitude-treatment interaction studies 
(Snow, 1991), and this terminology has been adopted in studies of psychological 
therapies (Shoham-Salomon & Hannah, 1991). Designs that incorporate many 
treatment and client factors can attempt to analyze what treatment works best in what 
circumstances or, as Paul’s (1967, p. Ill) famous question states: “What treatment, 
by whom, is most effective tor this individual with that specific problem, and under 
which set of circumstances.”. However, such large-scale designs, within what has been 
called the matrix paradigm (Stiles, Shapiro, & Elliott, 1986), have serious practical 
limitations, not least because of the large number of participants required. 

Analysis of covariance designs are similar to blocked designs, but are used when it is 
known that the individual difference variable being investigated, for instance, 
psychological mindedness or severity of symptoms, has a linear relationship with the 
outcome variable. Analysis of covariance is a more powerful procedure than blocking, 
but the statistical assumptions that must be met before it can be employed are more 
restrictive. Keppel and Wiclcens (2004) give a useful discussion of the relative merits 
of the blocking versus the analysis of covariance approach. 

Control and Comparison Groups 

The terms control and comparison group are rather loosely defined. The implication of 
the term “control group” is that some active ingredient in the experimental group is 
missing (as in agricultural experiments, where the experimental groups might be given 
various fertilizers and the control group none), whereas the term “comparison group” 
implies that a viable alternative treatment is given. We will use the term “control 
group” as a shorthand for control or comparison groups. As we discussed above in the 
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section on the nonequivalent groups pretest-posttest design, several types of controls 
are possible, depending on the research questions, although the selection of suitable 
control groups for psychotherapy research (and other applications) raises ethical, 
scientific, and practical problems. 

No-treatment controls , in which the control group receives no treatment at all, are 
used to provide a maximum contrast with the therapy under investigation. However, 
there are serious ethical issues in withholding treatment from clinically distressed 
clients. Researchers must balance the possible harm resulting from an untested 
treatment with the denial of benefit resulting from a potentially effective one. This 
problem may not arise in a nonrandomized design, since a group of clients who might 
be unable (for geographical or other reasons) to receive the experimental treatment 
could be used as a control group. Furthermore, no-treatment controls are only useful 
in the early stages of research on a clinical problem area or diagnosis; once the 
effectiveness of psychological therapies for a clinical problem has been demonstrated, 
no-treatment controls become scientifically uninteresting and ethically questionable. 

Wait-list controls often provide a workable compromise, particularly with short¬ 
term treatments or with mildly distressed clients. Clients who are randomly assigned 
to the wait-list group are given the same initial assessment, and then placed on a 
waiting list to receive the intervention once the experimental group has completed it. 
Thus they control for the reactivity of the initial assessment, the instillation of hope, 
and spontaneous recovery. 

Expectancy and relationship control groups control for expectancies of benefit, or 
instillation of hope, and also for the effects of contact with the therapist. In drug 
trials, where patients are given a sugar pill or other pharmacologically inert substance, 
they are known as placebo control groups. However, this terminology is too imprecise 
for the psychological context: it is better to be specific about what the control group 
is intended to control for. In pharmacology, clinical trials are ideally done in a double- 
blind study, where neither patients nor doctors know which experimental condition 
each patient is in, or triple-blind studies , where, in addition, the experimenters do not 
know. However, even in drug trials, this is not always practicable because patients and 
their physicians may be able to distinguish between active drugs and placebos, for 
example, by their side effects. Even having the clients blind to the status of the inter¬ 
vention is nearly impossible in psychological applications, where any placebo control 
treatment should appear as credible to clients as the experimental one (Boot, Simons, 
Stothart, & Stutts, 2013). Expectancy or relationship controls generally work well for 
clinical trials in drug research, but are questionable for psychotherapy research, where 
relationship factors or aspects of the therapeutic alliance (so-called “nonspecific 
factors” or “common factors”) are generally the best predictors of outcome (Duncan, 
Miller, Wampold, & Hubble, 2010; Norcross, 2011). 

Comparative treatment groups use a credible, established comparison treatment, 
rather than a placebo. One common approach is to compare a new approach to 
“treatment as usual ” (often abbreviated to TAU), although the lack of specificity 
about what this means can also be a problem. Comparative treatment groups provide 
an ethical way of doing research. Given the broad equivalence of most major forms of 
therapy, however, they are unlikely to produce statistically significant effects unless 
the sample is quite large (over 60 clients per group; Kazdin & Bass, 1989). In these 



156 


Foundations of Design 


situations, the researchers’ concern is often to show comparability rather than differ¬ 
ences, but this requires performing statistical equivalence testing (see Chapter 12), 
something that is rarely done. A recent approach is to conduct a non-inferiority trial , 
which aims to demonstrate that a new treatment is no worse than a standard treatment. 
For example, in the Dimidjian et al. (2006) trial that we have been using as a running 
example, behavioral activation and antidepressant medication were found to be within 
the range of noninferiority (i.e., their outcomes were statistically equivalent to each 
other). Occasionally, comparative treatment groups appear to have been selected in 
order to make the researchers’ favored treatment look good (sometimes pejoratively 
referred to as “intent to fail” control groups; Westen & Bradley, 2005). 

Dismantling studies aim to see what the effective components of a treatment are. 
The full treatment is compared to a comparison group, which receives that treatment 
minus one component. For example, a cognitive-behavioral treatment for fear of 
heights that includes relaxation training might be compared to a treatment that is 
exactly the same except for the relaxation component. The reverse strategy is also pos¬ 
sible (and logically equivalent): constructive (or additive) designs examine the effects 
of adding additional components intended to enhance the effectiveness of a therapy. 

Limitations of Randomized Designs 

Although randomized designs are scientifically valuable, they may be not be concep¬ 
tually appropriate in certain situations. 

• Randomization cannot be used ethically to study the impact of negative experi¬ 
ences, for example, cigarette smoking, illicit drug use, disasters, or psychological 
trauma. In these cases, nonrandomized experimental designs or correlational 
designs must be used (which is why there is more scope for interested parties, such 
as tobacco companies, to dispute the results). 

• Sometimes RCTs may be used to study something blindingly obvious, for which 
a randomized trial is superfluous, or worse. Smith and Pell’s (2003) tongue-in- 
cheek paper entided “Parachute use to prevent death and major trauma related to 
gravitational challenge: Systematic review of randomized controlled trials” says it 
all - perhaps we shouldn’t recommend that aircrew use parachutes, since their 
efficacy has never been evaluated in an RCT? 

• The design requirements of RCTs may mean that they are unrepresentative of 
normal clinical practice, and therefore that their results do not generalize outside 
of the research setting. The clients may be selected according to narrow inclusion 
criteria (e.g., no co-morbid problems), the therapists may be highly trained 
“super-therapists” with low case loads and a strong allegiance to the treatment 
under investigation. In addition, the therapies may be delivered inflexibly 
according to a prespecified protocol or treatment manual. 

• Randomized trials do not take account of patient choice (Bower, King, Nazareth, 
Lampe, & Sibbald, 2005; Brewin & Bradley, 1989). Outside of research studies, 
some clients will select a treatment based on their individual preferences: 
randomized trials may thus give clients a treatment that they do not want and 
which they may consequently fare less well with. King et al. (2000) carried out an 
interesting example of a preference trial in which clients with depression could 
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either consent to be randomized to one of three different therapies, or they could 
express a preference for one of them. 

• Randomization does not control for one of the most pervasive factors affecting 
the results of comparative treatment studies: researcher allegiance. This is the ten¬ 
dency of researchers to find positive results for the treatment that they favor; this 
phenomenon has been demonstrated in a variety of studies (e.g., Elliott, Watson, 
Greenberg, Timulalc, & Freire, 2013; Luborslcy et al., 1999). 

In addition to these conceptual problems, there are also a number of practical issues 
that arise when implementing RCTs in clinical practice (Haaga & Stiles, 2000; Rossi 
et al., 2004; Shadish et al., 2002): 

• Random assignment to experimental groups does not ensure that the groups will 
be equivalent or that they will stay equivalent. Randomization is, by definition, a 
chance process, and will thus occasionally produce some unusual distributions. 
Problems of nonequivalence become more acute if the sample sizes are small or if 
there are a large number of “nuisance variables” on which the researcher is trying 
to equate the groups (Hsu, 1989). 

• Many experiments suffer from attrition , that is, some participants may drop out 
of the study before the treatment is completed and the post-measures are col¬ 
lected. Attrition reduces the equivalence of the experimental and control groups 
(Flick, 1988; Howard et al., 1986). One potential way to deal with this is to con¬ 
duct intent-to-treat analyses , in which all participants who enter the study are 
included in the analysis, and the latest available data is used from those who drop 
out before the end, a procedure known as last observation carried forward (Comer 
& Kendall, 2013). However, such analyses are likely to be too conservative: in 
order to preserve the benefits of randomization, they focus on the effects of 
assigning clients to treatments. They thus sacrifice an accurate assessment of the 
effects of completing treatment, which is the goal of completer analyses. 

• There may be leakage between the conditions. For example, if half of the patients 
on a hospital ward are taught a useful skill, for example, mindfulness meditation, 
they may then teach it to other patients in the control condition. In drug trials, 
there is anecdotal evidence that patients in the experimental group may some¬ 
times share their medication with their fellow patients in the control group who 
have been deprived of it. 

• Other staff may not understand the need for randomization, seeing it as antithetical 
to the principle of giving individualized care to each client, and thus it may be 
hard to obtain the necessary cooperation for the study. 

• Randomized experiments are costly and time-consuming, and should therefore 
only be used where there is prior evidence that the experimental treatment is 
beneficial, or at least not inferior to existing treatments. 

A realization of these conceptual and practical problems with randomized experi¬ 
mental designs, which are still regarded by many as a scientific panacea, has shaped the 
growing interest in nonrandomized experimental designs and in correlational designs 
(“effectiveness studies”), especially in naturalistic clinical settings (West, 2009). 
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Evaluating RCTs 

It is easy to feel bewildered by all the complexities of randomized designs. Few readers 
are likely to be involved in conducting large treatment trials, but all practicing psychol¬ 
ogists need to be aware of the results from such trials in order to keep up with and eval¬ 
uate the state of the evidence. What are some of the features of RCTs that lend credibility 
to their findings? A number of quality appraisal tools exist that set out checklists of cri¬ 
teria for evaluating RCTs, for example, the “risk of bias” chapter in the Cochrane hand¬ 
book (Cochrane Collaboration, 2011) and Downs and Black’s (1998) checklist. 

In addition to quality criteria for the design of trials, there are also guidelines for 
reporting them. The CONSORT (“Consolidated Standards of Reporting Trials”) 
statement gives authoritative recommendations on writing up journal articles 
describing RCTs (see http://www.consort-statement.org). There are also equivalent 
statements for non-randomized designs (the TREND statement: http://www.cdc. 
gov/trendstatement/) and for epidemiological observational studies (the STROBE 
statement: http://www.strobe-statement.org/). 

The following list summarizes the main criteria for high quality RCTs: 

• The study uses randomized assignments to conditions, in order to rule out selec¬ 
tion effects, together with an analysis to demonstrate that the groups actually 
were similar after randomization. 

• Specific interventions. The intervention is specified, so that it is clear what therapy 
is being delivered, it is constant across therapists, and it can be repeated by other 
investigators if necessary. This requirement has led to the ma.nua.liza.tion of 
therapy and to the inclusion of fidelity checks or adherence checks to ensure that 
these treatment protocols are adhered to (Comer & Kendall, 2013). 

• Appropriate control groups are used, so that it is clear what the therapy is being 
compared with (e.g., wait-list controls, treatment as usual, or other comparative 
treatment groups). 

• The groups were treated equivalently apart from the experimental variable in 
question (e.g., they had the same length of treatment, equivalent therapists, the 
same assessments, etc.). 

• The raters and assessment interviewers were blind to the experimental condition the 
patient was in. However, it is usually not possible to conduct double-blind, or triple¬ 
blind studies in psychology, as it is in medicine, since the therapists know what they 
are giving, and the clients usually know what they are getting (Seligman, 1995). 

• The clients form a specific, homogeneous group. This usually means that they meet 
criteria for a single DSM diagnosis: patients with comorbidities are often excluded. 

• There was a low attrition rate in the study. 

• The clients are followed-up after the termination of therapy (e.g., at six months or 
a year). 

• Demonstrations of the efficacy of a treatment are replicated by independent teams 
of researchers, thus demonstrating the portability of treatments across research 
settings. 

Some authors (e.g., Chambless & Hollon, 1998) have attempted to use criteria 
such as these in order to conclude that certain therapies can be considered as 
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empirically supported treatments and others not. However, these attempts have been 
controversial (Elliott, 1998; Westen, Novotny, & Thompson-Brenner, 2004). Key 
issues include the use of criteria that favor some treatments over others and the dis¬ 
missal of all non-RCT research. While we regard RCTs as potentially powerful research 
designs, we believe that nonrandomized designs also have a place, particularly when 
they use naturalistic client populations or clinical settings. It seems unproductive to 
force all treatment research into the same mold, as different research designs have 
complementary strengths and weaknesses—no single design provides a “royal road” 
for evaluating therapy outcomes. 


Conclusion: Choosing a Research Design 

The central issue for researchers is to choose a design appropriate to the research 
questions and to the stage of the research program and research area. In the early 
stages of investigating the treatment of a problem, or where there has been litde 
previous research, you are probably unsure of the nature of the phenomena you are 
looking at. Furthermore, a newly established clinical service may not be stable: its 
operational policies and modus operandi may be constandy changing (Patton, 2008; 
Rossi et al., 2004). In these cases, a simple descriptive or correlational design is usually 
better. Later on, when you are clearer about what the important variables are and how 
they interrelate, and are more able to specify the nature of the treatment, you can 
proceed to the next stage by using a one-group pretest-posttest design or systematic 
case study designs (see Chapter 9). Then, if the treatment appears effective and 
resources and circumstances allow, you can move on to more sophisticated randomized 
experimental designs to test efficacy, perhaps adding correlational components in 
order to pin down the effects of crucial variables and to test competing theoretical 
models (Campbell et al., 2007). Also, several studies taken together can often help 
eliminate specific competing theoretical explanations. 

We have drawn heavily on Cook and Campbell’s (1979) analysis of threats to 
validity (see also Shadish et al., 2002). Their central theme is that no research design 
is perfect: the important thing is to be aware of the strengths and weaknesses of what¬ 
ever design you decide to adopt. As we have discussed above, it is important not to 
read Cook and Campbell as saying that research designs must have no validity prob¬ 
lems at all, or that certain designs are automatically flawed (Schmidt & Hunter, 2015). 
Their message is to do the best that you can in the circumstances, but to be aware of 
potential problems that may arise later on in interpreting the findings. Thus, research 
designs require careful planning and analysis in order to anticipate the potential results 
and competing explanations. 


CHAPTER SUMMARY 

Design, in the restricted sense in which we are using it here, refers to the logical struc¬ 
ture of the study. It encompasses such issues as whether there is a control group and 
whether the data are collected at one time point or several. The central classification 
is into experimental and nonexperimental designs, which depends on whether or not 
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the researcher is making an active intervention (also known as an experimental 
manipulation). 

Nonexperimental designs can be classified into descriptive and correlational 
designs, according to the type of analysis conducted. The golden rule when inter¬ 
preting the results from nonexperimental designs is that “correlation does not equal 
causation”: if two variables are correlated there is not necessarily a causal relationship 
between them. There are several possible models of the causal relationships of two or 
more variables, including mediator and moderator variable models, as well as 
conceptual confounding between cause and effect variables. 

The work of Campbell and colleagues (e.g., Cook & Campbell, 1979, Shadish 
et ah, 2002) has made two major contributions to the study of research design: (1) the 
classification of validity types; and (2) the analysis of quasi-experimental designs. 
Cook and Campbell (1979) analyze several nonrandomized (quasi-experimental) 
designs, examining the validity threats associated with each one. 

Cook and Campbell’s four validity types are statistical conclusion validity (which 
assesses the appropriateness of the statistical methods), internal validity (which assesses 
the evidence for the existence of causal relationships), construct validity (which assesses 
the meaning of the measurement operations, including the experimental manipula¬ 
tion), and external validity (which assesses the study’s generalizability). 

Experimental designs are classified into randomized and nonrandomized (quasi- 
experimental) designs, according to whether or not there is random assignment to 
experimental conditions. There are several commonly used nonrandomized designs. 
Each has its associated validity threats. 

Randomized experimental designs have the potential to address many of the validity 
problems associated with nonrandomized designs and thus to facilitate inferences 
about causality. They are central to the discussion of evidence-based practice and 
empirically supported therapies. However, they do have practical, scientific, and ethical 
limitations, and should not be regarded as a scientific panacea. 


FURTHER READING 

Cook and Campbell’s (1979) ideas on causation, validity, and on the different exper¬ 
imental and correlational designs are essential reading for all serious researchers; their 
classic book has been updated by Shadish et al. (2002), but the original is still worth 
reading. Christensen, Johnson, and Turner (2013) give an overview of design from 
the standpoint of experimental psychology. A statistical treatment of experimental 
designs is given in the standard texts, such as Field (2014), Howell (2002), and Winer 
et al. (1991). Comer and Kendall (2013) illustrate the issues as applied to therapy 
outcome research. Seligman’s (1995) paper on the Consumer Reports effectiveness 
study, together with subsequent commentary in the American Psychologist (October 
1996, volume 51, issue 10), air the central issues in the efficacy versus effectiveness 
debate. A 1998 special issue of Psychotherapy Research (volume 8, issue 2) and also of 
the Journal of Consulting and Clinical Psychology (volume 66, number 1) both debate 
the research issues raised by the empirically supported therapies movement, and 
Westen, Novotny, & Thompson-Brenner (2004) provide a thoughtful commentary. 
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Persons and Silberschatz (1998) set out their contrasting views, using a debating 
format, about the value of RCTs for clinicians. 

As in other areas of research, it is well worth reading some sample studies. The ones 
that we have looked at in this chapter provide a good starting point: Brown and 
Harris (1978) for a descriptive, correlational design, Stiles et al. (2008) for a quasi- 
experimental design, and Dimidjian et al. (2006) for a randomized controlled trial. 


QUESTIONS FOR REFLECTION 

1. Use the Cook and Campbell (1979) design validity framework to identify 
problems that could arise in a study you would like to do. 

2. Which designs do you see as most useful for studying the outcome of psychother¬ 
apies? Why? 

3. An interesting variant of the nonequivalent comparative treatment design, 
mentioned in this chapter, is the “preference trial” in which clients are allowed to 
choose between two available treatments. Think about the pros and cons of this 
design. 

4. Try to identify an aspect of your research topic that lends itself to experimental 
manipulation. Alternatively, if you’re already planning an experimental study, 
think about particular experimental designs that could be used. 

5. What is your view of the Empirically Supported Treatments debate? Take a 
position pro or con and defend it. 
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KEY POINTS IN THIS CHAPTER 

• Small-N designs follow the idiographic approach of looking in depth at the 
individual. 

• They are often appealing to clinicians, as a way of combining research and 
practice. 

• They derive from several different traditions: narrative case studies in neuro¬ 
psychology and medicine, operant behaviorism, Shapiro’s single-case approach, 
and idiographic research in personality theory. 

• There are two main types of design: single-case experiments and naturalistic 
case studies. 

• Single-case experiments are characterized by frequently administered mea¬ 
surements and the experimental manipulation of an intervention, in which 
participants serve as their own controls. They are most often used within 
operant behavioral approaches to therapy. 

• Naturalistic case studies range along a continuum of approaches with varying 
degrees of rigor, from narrative approaches, such as Freud’s, to more struc¬ 
tured studies using systematic measurement of process and outcome and 
time-series studies using sophisticated statistics. 


Small-N designs, such as systematic case studies and single-case experiments, are a 
potentially appealing way of blending science and practice, since they enable clinicians 
to integrate formal research methods into their everyday work (Hayes et al., 1999; 
Kazdin, 2011). From the practitioner’s point of view, the advantages of small-N 
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research are that it is usually inexpensive, not very time-consuming, and, more impor¬ 
tantly, that its underlying philosophy is often congenial to practitioners, since it 
addresses individual uniqueness and complexity. 

Recall the nomothetic versus idiographic distinction that we introduced in Chapter 4. 
Nomothetic methods compare across individuals, looking for general patterns or laws; 
idiographic methods look intensively within a single individual, to gain greater under¬ 
standing of that person’s unique personality or psychological responses. Nomothetic 
approaches, particularly the large-group experimental and quasi-experimental designs 
that we examined in Chapter 8, have long been criticized on the grounds that individual 
variation and uniqueness get submerged by the process of averaging across a larger 
group (Dukes, 1965). 

For example, in a seminal paper on psychotherapy research, Kiesler (1966) drew 
attention to the existence of “uniformity myths”: the implicit assumption by 
researchers that clients are all similar, that different therapists each deliver an 
identical intervention, and so on. For instance, in an outcome study, the overall 
difference between the mean pre-therapy score and the mean post-therapy score on 
a depression measure may indicate that the therapy is beneficial. However, as Bergin 
(1966) originally pointed out, such positive mean changes may conceal the fact 
that, although most clients have improved, a significant minority have deteriorated 
(Barlow, 2009). Such differential responses would not be discovered without chal¬ 
lenging the uniformity myth and paying attention to each individual client’s unique 
pattern of change. 

As a second example, in neuropsychological case-control research, client hetero¬ 
geneity may obscure important effects (Shallice, 1979). Clients may vary in their 
responses to neurological lesions according to such factors as age, premorbid func¬ 
tioning, or the size of the lesion, and this variation will only become apparent if one 
looks in detail at each single case. 

Small-N designs can therefore make up for some of the drawbacks of nomothetic, 
group-comparison designs. The idiographic approaches described in this chapter pro¬ 
vide ways of rigorously examining individuals’ responses, particularly in the context of 
evaluating psychological interventions. 


Historical Background 

Small-N studies were the dominant paradigm in medicine and in psychology until the 
beginning of the 20th century. Before that time, statistical theory barely existed. 
Then, in the early decades of the century, Pearson and Fisher developed methods such 
as correlation and the analysis of variance. These methods were originally developed 
within the agricultural context, in order to systematically assess the yields of different 
fertilizers or strains of wheat, but they were rapidly adopted in medicine and psy¬ 
chology. In agricultural research, large samples and group-comparison designs work 
very well; examining the response of individual plants is less relevant. However, the 
agricultural metaphor does not translate easily to clinical psychology, where individual 
differences are often of major importance. In recognition of this, there has been a 
resurgence of small-N methods in the past 30 years, deriving its impetus from several 
different traditions. 
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Historical traditions of small-N research: 

• Single-case studies in medicine and neuropsychology 

• Operant behaviorism 

• Shapiro’s single-case approach 

• Idiographic personality research 


First, there is the narrative case study, which is a continuation of the long tradition of 
descriptive case studies in medicine. The first published studies of psychological 
therapy were case studies, those of Sigmund Freud being the outstanding example. 
Often case studies are reported to illustrate the development of new theoretical 
approaches. For example, both Rogers’s (1951) Client-Centered Therapy and Beck’s 
(1976) Cognitive Therapy and the Emotional Disorders exemplify their theoretical 
ideas by presenting case illustrations, including excerpts of verbatim transcripts from 
therapy sessions. 

Arising out of the same medical tradition is the single-case study in neuropsycholog¬ 
ical research. Luria (1973) dates the birth of scientific neuropsychology to 1861, 
when Broca described a case of speech impairment that was associated with a localized 
lesion of the brain. The famous individual cases of Phineas Gage and of the amnesia 
patient known as “HM” contributed to the development of the discipline. Case studies 
continue to play an important role in the development of the area, and currently seem 
to be enjoying a resurgence of popularity (Crawford, 2014; Evans, Gast, Perdices, & 
Manolov, 2014; Shallice, 1979). Methodologically, examples range from qualitative 
narrative case studies (e.g., Sacks, 1985) to case series studies using intensive 
quantitative neuropsychological test data (e.g., Rapp, 2011). 

Second, there is the tradition of applied behavior analysis (i.e., operant behav¬ 
iorism). Skinner (quoted in Barlow & Nock, 2009) said that “instead of studying a 
thousand rats for one hour each or a hundred rats for ten hours each the investigator 
is more likely to study one rat for a thousand hours.” In his view, the goal of behavioral 
science is “to predict and control the behavior of the individual organism” (Skinner, 
1953, p.35). Single-case experimental designs, aimed at demonstrating such predic¬ 
tion and control, were first developed in the 1950s and 1960s (Davidson & Costello, 
1969), and studies using these designs proliferated in the 1970s. The Journal of 
Applied Behavior Analysis continues to be devoted to publishing examples of this kind 
of work. 

Third, innovative measurement methods for single-case designs were pioneered in 
the United Kingdom by Monte Shapiro, who developed a measurement technique 
known as the Shapiro Personal Questionnaire, which enables each patient’s problems 
to be quantified and monitored on a day-by-day or weelc-by-weelc basis (Phillips, 
1986; Shapiro, 1961a, 1961b). In contrast to the operant work, Shapiro’s approach 
takes a more phenomenological stance, being tailored to the individual client’s view 
of his or her problems, and it is less concerned with the experimental manipulation 
of treatments. A simplified version of the Personal Questionnaire is a key element in 
contemporary single-case research (Elliott et al., 2015). 
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Fourth, there is the idiographic tradition in personality research. Allport (1962) 
passionately criticized psychology’s almost exclusive reliance on nomothetic methods: 
“Instead of growing impatient with the single case and hastening on to generaliza¬ 
tion, why should we not grow impatient with our generalizations and hasten to the 
internal pattern?” (Allport, 1962, p. 407). Murray (1938) developed an approach to 
studying personality based on intensive investigation. Proposition one of his theory, 
which captures the key idea of this chapter, is “The objects of study are individual 
organisms, not aggregates of organisms” (Murray, 1938, p. 38). 

The terminology of the area pardy reflects these different traditions. Single-case 
designs (also referred to as N = 1 designs) are characterized by repeated measures on a 
single case. They usually involve an experimental manipulation of a treatment, although 
there are quasi-experimental versions, such as time series designs. Studies that do not 
use intensive repeated measures and do not have an experimental manipulation, for 
example, the classic case-history approach, are referred to as case studies (Bromley, 
1986; Dukes, 1965), although the boundary between the two approaches is not always 
clear. This chapter will focus first on single-case experimental designs and then on case 
studies. Although the focus is on design, we also include some suggestions about 
measurement, because small-N designs call for some specific measurement approaches. 


SINGLE CASE EXPERIMENTAL DESIGNS 

Single-case experimental designs are used to test an experimental intervention on a 
single individual. The essence of the design is that each individual participant serves as 
their own control. Single-case experiments also involve repeated measurements, which 
allow the process of change to be closely monitored. A key assumption of these designs 
is that they are based on an adequate functional (i.e., causal) analysis of the problem, 
providing an understanding of the situational variables (cues and reinforcers) which 
appear to be controlling the problem behavior (Haynes & O’Brien, 2000; Morgan & 
Morgan, 2001). Such a behavioral case formulation is established during a preliminary 
behavioral assessment period. The experimental design then serves as a test of the 
functional analysis. 


Procedure 

As with group-comparison designs, the first step is to select the measure, or measures, 
to be used. In single-case experimental designs, the measures need to be capable of 
frequently repeated administration (Smith, 2012): that is, they must be brief and min¬ 
imally reactive. The two most common types are observer ratings (e.g., staff ratings of 
a patient’s self-injurious behavior on an in-patient ward) and the client’s own ratings 
from self-monitoring (e.g., of their obsessional thoughts). Having chosen the mea¬ 
sure, the next step is to select an appropriate frequency of measurement: usually it is 
daily, but it may also be, say, hourly or weekly, depending on the nature of the behavior 
being monitored. 

All single-case experimental designs start with a series of baseline measures. These 
are measures taken to establish the level of the target variable before the clinical 
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intervention is introduced. They continue until the measurements are stable, prefer¬ 
ably for 10 to 20 observations. After that, the first experimental treatment phase 
begins. These designs have their own special notation, based on the first few letters of 
the alphabet: A stands for the baseline, or no-treatment phase; B, C, D, etc. stand for 
the various treatments or interventions. There are many possible single-case designs, 
each of which raises practical and sometimes ethical issues. We will look at four 
commonly used ones here; more elaborate designs are given in specialist textbooks 
(e.g., Barlow, Nock, & Hersen, 2008; Hayes et al., 1999; Kazdin, 2011). 


Problem 

severity 



Figure 9.1 The AB design 


Common single-case designs: 

• AB design 

• Reversal (ABAB) design 

• Multiple-baseline design 

• Changing-criterion design 


AB Design 

The AB design (see Figure 9.1), in which the baseline is followed by an intervention, is 
the simplest form of single-case experiment. For example, the effectiveness of a positive 
parenting approach to manage a six-year-old girl’s tantrums might be investigated. The 
parents would be asked to observe the number and severity of their daughter’s tantrums 
(suitably operationalized) every day for two weeks (this constitutes the baseline, or A, 
phase of the design). Then, in the intervention, or B, phase of the design, they would 
be taught a new way of responding to their daughter, such as time-outs for the tantrums 
and praise for good behavior. If the intervention is effective, there will be a reduction in 
the target problem behavior’s severity and frequency. 

The drawback of the AB design is that, in the absence of other information, it only 
gives weak evidence for the causal influence of the experimental treatment. It suffers 
from many of the same threats to internal validity as the one-group pretest-posttest 
design (Cook & Campbell, 1979; see also Chapter 8), for example, that an interfering 
event may occur at the same time as the treatment is introduced (e.g., the girl could 
make a new friend at school, and so be happier, which reduces her need to tantrum). 
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Problem 

severity 



Figure 9.2 The ABAB design 


More elaborate designs have therefore been developed to try to overcome this 
problem, and, in particular, to build in opportunities to replicate the treatment effect, 
thus increasing its credibility. 


Reversal (or ABAB) Design 

The reversal (or ABAB) design is an AB design immediately followed by its own rep¬ 
lication (see Figure 9.2). It is the classic operant behavior modification design. For 
instance, in the child’s tantrum example above, once the intervention had been shown 
to be effective, the baseline, no-treatment, phase would be reinstituted, followed finally 
by the intervention again. The rationale is that changes in frequency of the target 
behavior after these reversals demonstrate the experimental control of the intervention. 

There are also more complicated variants of this design, for example, the ABACAB 
design in which a second intervention, C, is introduced after the second baseline phase. 
For example, a token economy on an in-patient ward might have its contingencies 
modified in the second phase. 

The ABAB design suffers from three major problems. First, the effects of many inter¬ 
ventions are not reversible. In other words, clients do not automatically relapse when 
treatment ends: permanent learning or personality change may occur, or the problem 
may not recur once it has been dealt with. Thus, this design could not be used to study 
the impact of psychodynamic or cognitive therapy, for example, because these therapies, 
if successful, effect irreversible changes in the way that clients think and feel about them¬ 
selves. The design’s expectation of reversibility is based on the assumption that external 
processes control behavior. Second, even if the intervention is reversible, there are 
serious ethical problems with the withdrawal of treatment in the second and subsequent 
baseline phases. This problem is similar to the ethical dilemma faced in having no¬ 
treatment control groups in group-comparison designs, but it is more acute because in 
this case treatment is withdrawn rather than withheld. For example, when the interven¬ 
tion is withheld, the design is “successful” if the child reverts to having tantrums, or if 
psychiatric in-patients recommence self-injurious behaviors. Thus, the design creates a 
conflict of interest between clinical and scientific goals. Third, switching the interven¬ 
tion on and off may have undesirable psychological consequences. For example, it may 
lead to the client’s losing trust in the therapist, or may even lead to the problem behavior 
being harder to extinguish because it is maintained on a partial reinforcement schedule. 
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Figure 9.3 The multiple-baseline design 


Multiple-baseline Design 

With several (presumed independent) target behaviors (e.g., a child who suffers from 
tantrums, nocturnal enuresis, and dog phobia) or one target behavior in several 
independent settings (e.g., aggression at home, in the classroom, and on the play¬ 
ground), you can use a multiple-baseline design. Similar interventions targeted at 
each behavior, or in each setting, are introduced sequentially and their impact on all 
the target behaviors is measured (see Figure 9.3). The idea is to replicate the effect of 
the intervention on each particular problem or setting. For example, Chadwick and 
Trower (1996) used this design to investigate the effects of cognitive therapy for 
paranoid delusions. The intervention was targeted sequentially at the client’s negative 
self-evaluation and two separate delusional ideas, and each of these problems was 
reduced in severity in the predicted order. 

Barlow et al. (2008) interestingly fit the famous early psychoanalytic case of Anna 
O. (Breuer & Freud, 1895/1955) into this schema, since Breuer targeted various 
separate interventions, such as hypnosis and interpretation, at each of Anna O.’s 
symptoms in turn. However, this design assumes that changes will not generalize 
from one problem or setting to other problems or settings—that is, like the previous 
design, it is based on the behaviorist assumption that behavior is situationally specific. 
Thus, although the within-participant version of this design is theoretically amenable 
to nonbehavioral therapies, it is not really applicable to therapies that aim for general 
change, in spite of claims to the contrary (Morgan & Morgan, 2001). 

An extension of this design involves replication across multiple cases, which is a 
special case of a clinical replication series (Hayes et al., 1999: see section below on 
generalization). For example, Bennun and Lucas (1990) used a version of this design 
to investigate the impact of a two-component intervention with couples in which one 
partner had a long-standing diagnosis of schizophrenia. Using a “multiple single-case 
design” with a sample of six couples, they showed that the first component of the 
intervention—education—had an impact on the well spouse’s perception of their 
ability to cope, but had no effect on presenting symptoms. The second component of 
the intervention—problem solving and communication training—had an impact on 
positive symptoms of schizophrenia. 
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Figure 9.4 The changing-criterion design 

Changing-Criterion Design 

This design is used to demonstrate experimental control over a single problem behavior 
that may be progressively reduced in severity (see Figure 9.4). It is particularly useful 
in working with clients who are dependent on drugs or alcohol. For example, it may 
be used in a smoking-cessation program, in which the client progressively cuts down 
to more stringent targets (e.g., Criterion 1 would be 20 cigarettes a day, Criterion 2 
would be 15 a day, and so on down). Or it could be used with a positive behavior that 
is being shaped, for instance, appropriate social interaction in a child with severe 
autistic behaviors. 


Data Analysis 

Data from single-case designs are normally displayed on a graph, similar to those in 
Figures 9.1 to 9.4. Part of the appeal of these designs is that the success or failure 
of the intervention is usually immediately obvious from the graph (Morley & 
Adams, 1991). It can often be helpful to show such graphs to the clients, to enable 
them to monitor their progress and to demonstrate clearly that the intervention is 
working. An emerging use, discussed in Chapter 11, is to provide feedback to ther¬ 
apists (Shimokawa, Lambert, & Smart, 2010), especially to alert them when the 
treatment is not going well, but also to provide encouragement when positive 
progress is occurring. 

However, in some cases, the changes may be less clear cut, or a measure of their 
magnitude may be required. This has led some researchers to call for the use of 
statistical methods in single-case designs. The topic of which, if any, statistical tests to 
use for extended time series of related observations is too technical to cover here: 
Morley and Adams (1989) and Smith (2012) outline some possibilities. 


Generalization 

Although single-case studies are essentially idiographic, the investigator often wishes 
to generalize beyond the specific individuals studied in order to make broader claims 
about the effectiveness of the treatment tested. This can be done by conducting a 
clinical replication series (Hayes et ah, 1999), that is, by replicating the same study on 
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several individuals. In this way the external validity of the findings are strengthened. 
(See Chapter 10 for further discussion of generalization issues.) The notion of a 
clinical replication series is derived from Cronbach’s (1975) concept of locally intensive 
observation. As a finding is tested in other settings, varying conditions will test the 
limits of its external validity and lead to richer theory: 

As [the researcher] goes from situation to situation, his first task is to describe and 
interpret the effect anew in each locale, perhaps taking into account factors unique to 
that locale. ... As results accumulate, a person who seeks understanding will do his 
best to trace how the uncontrolled factors could have caused local departures from 
the modal effect. That is, generalization comes late and the exception is taken as seri¬ 
ously as the rule. (Cronbach, 1975, p. 125) 

Such an approach can be applied equally well in experimental and naturalistic, non- 
experimental, approaches. 


NATURALISTIC CASE STUDY DESIGNS 

As we have noted, behaviorally oriented researchers (e.g., Morgan & Morgan, 2001) 
often claim that experimental single-case designs can readily be adapted to nonbehav- 
ioral treatments. However, the emphasis on observable events and experimental 
manipulation makes most of these designs problematic for studying psychodynamic, 
experiential, and cognitive therapies, which focus on cognitions and emotions, and 
often lead to irreversible changes in the client. Naturalistic case-study designs—the 
narrative case study, the systematic case study, and time-series designs—are usually 
more appropriate to these types of therapy. 


Narrative Case Studies 

The narrative case study is the traditional description of a client or treatment, based 
on the clinician’s case notes and memory. Freud’s case histories, such as “Little Hans” 
(Freud, 1909/1955) or “Dora” (Freud, 1905/1953), are classic examples of this 
genre. Case studies have played an important role in the development of the 
psychological therapies. They can serve a number of purposes (Barlow et al., 2008; 
Dukes, 1965; Lazarus & Davison, 1971). These include (1) documenting the 
existence of a clinical phenomenon, often a rare one (e.g., early case studies of mul¬ 
tiple personality disorder), (2) disproving a universal proposition by demonstrating a 
counter-example (e.g., the proposition that only women suffer from hysteria could be 
disproved by documenting the case of a man with hysteria), (3) demonstrating a new 
intervention, and (4) generating hypotheses about causes. Valuable information can 
be gathered from case studies, as long as their nature and limitations are understood. 
In general, case studies can tell us what is possible, but not what is typical. Similarly, 
they can suggest a possible connection or cause, but cannot provide strong confirma¬ 
tory evidence for it. 
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Uses of narrative case studies: 

• Documenting the existence of a phenomenon 

• Disproving a universal proposition 

• Demonstrating a new intervention 

• Generating causal hypotheses 


However, Spence (1986) and others have argued that narrative case studies such as 
Freud’s contain too much “narrative smoothing”: that is, they are too selective and 
have often been altered (either deliberately or unconsciously) to tell a better story. 
Narrative distortions can be investigated by a self-experiment on one’s own clinical 
work (see box below). 

Like the one-group posttest-only design that we discussed in Chapter 8, narrative 
case studies can be used to infer possible causal explanations if sufficient additional 
information is available. In psychohistorical case studies (see Chapter 6), for example, 
Runyan (1982) points out that careful consideration of the known facts often allows 
the researcher to rule out most of the possible competing explanations for an event. 


Self-experiment on narrative distortion 

Audio-record a therapy session. A day later (or even an hour later), write down 
from memory a brief chronological account of what happened during the 
session. Then, listen to the recording while taking detailed notes and noting any 
inaccuracies. In addition to large amounts of missing material, you will also find 
that you have collapsed things that happened at different times, got some things 
out of order and may have attributed statements to the wrong speaker or even 
completely fabricated things. 


Systematic Case Studies 

Given the problems with narrative case studies (reliance on memory, anecdotal data 
collection, narrative smoothing), it is worth considering how to improve the quality 
of information from case studies, in order to strengthen the conclusions which may 
be drawn from them. Methodologists such as Kazdin (1981, 2011) and Hayes et al. 
(1999) have considered more systematic approaches to single-case research on clinical 
interventions, and have proposed the following general features for improving their 
credibility: 

• systematic, quantitative (versus anecdotal) data; 

• multiple assessments of change over time; 

• multiple cases; 

• change in previously chronic or stable problems; and 

• immediate or marked effects following the intervention. 
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The combination of these features substantially improves the researcher’s ability to 
infer that a treatment caused an effect (i.e., it increases the internal validity of the 
study). Note that the first three features are design strategies over which the researcher 
has some control, while the last two (previous stability and discontinuous change) are 
outcomes specific to the particular case. 

As part of a recent renewal of interest in systematic case studies (e.g., McLeod, 2010; 
Miller, 2004), several writers have recendy proposed adding further elements of good 
practice in case study research to this list. In his development of Hermeneutic Single-Case 
Efficacy Designs, Elliott (2002) has advocated expanded single-case designs that take an 
interpretive approach to examining client change and its causes. These designs aim to: 
(1) demonstrate that change occurred; (2) examine the evidence for concluding that 
therapy was responsible for the change; (3) examine alternative explanations for the 
change; and (4) examine which processes in therapy might have been responsible for 
change. They emphasize the use of a rich case record of comprehensive information on 
therapy outcome and process (e.g., using multiple perspectives, sources, and types of 
data), and critical reflection by the researcher, who systematically evaluates the evidence. 

A number of procedures, involving varying degrees of time and effort, may be used. 
We will address each of the above four areas, giving suggestions for how systematic 
case studies could be carried out by practicing clinicians. A good example of a 
systematic case study which illustrates many of the design features described here is 
Parry et al.’s (1986) study of “the anxious executive” (see box). 


Example of a systematic case study: “The anxious executive” 
(Parry et al., 1986) 

Parry et al. (1986) present a systematic case study of a senior manager who sought 
help for anxiety and depression related to stress at work and in his marriage. The 
case was drawn from a large research project examining psychotherapy out¬ 
come. Using multiple quantitative measures, including the Shapiro Personal 
Questionnaire, as well as therapist and client session-by-session accounts, the study 
examined in detail the changes that occurred over the course of therapy. It was able 
to identify the characteristics of sessions that had particular short-term outcomes, 
both positive and negative. It offers a good example of the potential strength of 
systematic case studies for providing a rich description of process and outcome. 


Demonstrating that Change Occurred 

The task here is to improve upon anecdotal impressions of client improvement or 
deterioration. There are several options, which we have roughly ordered from least to 
most time-consuming, so that clinicians may begin with a minimum requirement and 
work up to more elaborate procedures. 

• Administer a simple standardized measure , tailored to the particular client’s 
problem, before and after therapy. For example, give the Generalized Anxiety 
Disorder scale (GAD-7: Kroenke et al., 2007) to an anxious client. 
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• Add an individualized measure, before and after therapy (Mintz & Kiesler, 
1982; Sales & Alves, 2012). These ask clients to identify the major problem 
areas that they want to change, and to rate the severity of these problems. For 
example: goal attainment scaling (Kiresuk & Sherman, 1968), the target 
complaints procedure (Battle et al., 1966) or the Personal Questionnaire in 
both its original (Phillips, 1986; Shapiro, 1961a) and simplified versions (Elliott 
et al., 2015). 

• Use additional standardized measures , covering a broader range of variables. 
These may include measures of clinical or interpersonal distress, for example, 
CORE Outcome Measure (Barlcham et al., 2001), the Inventory of Interpersonal 
Problems (Horowitz, Rosenberg, Baer, Ureno, & Villasenor, 1988), or a global 
symptom inventory such as the SCL-90-R (Derogatis, 1994). 

• Add more assessment points , for example at mid-treatment (or every 5 to 10 
sessions) or at follow-up (e.g., six months after treatment). A session-by-session 
measurement procedure, known as ease tracking (Leach & Lutz, 2010), can also 
be extremely useful. It ensures that some measure of outcome is collected if the 
client drops out of treatment, and, more importantly, gives feedback to therapist 
and client about the progress of therapy (Shimolcawa et al., 2010). Any brief, 
easy-to-complete client or therapist-rated measure can be used for this purpose, 
including many of the measures mentioned above. 

• Use a qualitative approach. As McLeod (2011) has argued, outcome has both 
qualitative as well as quantitative elements, and qualitative interviews may be more 
sensitive to negative or unexpected effects, as well as allowing researchers to eval¬ 
uate the plausibility of clients’ claims to have changed. The Change Interview 
(Elliott, 1999; Elliott, Slaticlc, & Urman, 2001) is one example of a semi-structured 
qualitative outcome interview that can be given every 5 to 10 sessions, and at the 
end of therapy. 

• Add further cases, creating a clinical replication series (Hayes et al., 1999). 
Linking Change to the Therapy 

Here the task is to provide evidence to support a causal link between therapy and 
client outcome. As in single-case experimental designs, your conclusions are more 
credible if data suggesting causal links can be replicated within the case. Such evidence 
may also help to identify the effective ingredients of the intervention. Forms of poten¬ 
tial evidence may include: 

• Client self-report about therapeutic effectiveness. This may include general client 
satisfaction measures (e.g., the Client Satisfaction Questionnaire; Larsen et al., 
1979) or measures that identify specific helpful aspects of therapy (e.g., the 
Helpful Aspects of Therapy form; Llewelyn, 1988). 

• Significant within-case correlations between theoretically relevant within-session 
processes (e.g., therapeutic alliance) and session outcomes. 

• Qualitative evidence of important within-therapy events immediately preceding 
client improvements (e.g., particular themes addressed within a therapy session 
are followed by changes related to those themes). 

• Evidence of reliable change in stable or chronic problems. 
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Evaluating Alternative Explanations 

In addition to evaluating evidence that links client change to therapy, it is also impor¬ 
tant to search systematically for evidence that nontherapy processes may account for 
change. Cook and Campbell’s (1979) list of internal validity threats (see Chapter 8) 
can be used; in addition, Elliott (2002) highlights validity threats that are specific to 
single-case studies. For example, Elliott proposes eight nontherapy explanations for 
apparent client change. The first four involve the possibility that the client has not 
really improved: 

• Nonimprovement. Apparent changes are trivial or even negative. 

• Statistical artifacts. Apparent changes reflect statistical artifacts, such as measurement 
error or regression to the mean. 

• Relational artifacts. Apparent changes reflect attempts to please the therapist or 
researcher. 

• Client expectations. Apparent changes reflect client expectations or wishful thinking. 

The next four explanations assume that client improvements are real, but that non¬ 
therapy factors account for them: 

• Self-correction. Changes are due to client self-help efforts independent of therapy, 
or the self-limiting nature of short-term or temporary problems. 

• Extra-therapy factors. Changes result from life events outside of therapy, such 
as changes in relationships or work, or from help obtained from friends or 
family. 

• Psychobiological factors. Changes are caused by medication or other remedies, or 
by recovery from physical illness. 

• Reactive effects of research. Changes can be attributed to taking part in research, 
including interactions with research staff, altruism, and increased self-monitoring. 

The researcher’s task is to systematically evaluate both positive evidence (in favor of 
therapy as the cause of change and against nontherapy factors) and negative evidence 
(against therapy as the cause of change and in favor of nontherapy explanations). This 
weighing of both sides is analogous to political debate or legal proceedings, and it can 
be carried out by the researchers themselves or by independent judges. Case study 
researchers including Elliott et al. (2009) and Miller (2011) have developed adjudica- 
tional or legalistic procedures for weighing the complex, often contradictory 
information collected using systematic case study methods. 

Examining Therapy Process 

There are a variety of systematic ways to assess therapeutic process, that is, what hap¬ 
pens in a session and the client’s reactions to that session (see Greenberg & Pinsof, 
1986). Such information helps to elucidate the nature of therapeutic relationships and 
has the potential for generating theory about the mechanisms of change (Kazdin, 2007; 
Laurenceau, Hayes, & Felman, 2007). Stiles (2007) proposes a theory-building model 
of research in which a general theoretical understanding (e.g., of the process by which 
clients come to assimilate problematic experiences in therapy) is constructed and 
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elaborated through a series of single-case studies. Process data can therefore also be 
used to support inferences about linking client change to the therapy (see section 
above). Some ways of examining therapy processes include: 

• Records of therapy sessions. Audio or video recordings, or detailed process notes, 
are an excellent source of information about what actually happens in sessions. 
(They can also be used to corroborate or clarify self-report data.) 

• Therapeutic relationship measures can be administered every session, or less fre¬ 
quently (e.g., every three to five sessions). The most widely used such measure 
today is the revised short form of the Working Alliance Inventory (Hatcher & 
Gillaspie, 2006). 

• Self-report session measures can be completed by the client or the therapist. 
The Helpful Aspects of Therapy Form (Llewelyn, 1988) is a qualitative measure 
of client perceptions of significant therapy events. The Session Evaluation 
Questionnaire (Stiles, 1980) and the Session Impacts Scale (Elliott & Wexler, 
1994) are quantitative measures of clients’ immediate reactions to sessions. 

• Orientation-specific measures, completed by the therapist or the supervisor after 
each session, can be used to assess the therapist’s adherence to the treatment 
model (e.g., the Revised Cognitive Therapy Scale: Blackburn et al., 2001). 


Time-Series Designs 

The final example of naturalistic case study designs is the time-series design (Borclcardt 
et al., 2008). The aim of this design is to evaluate causal processes using correlational 
methods. Two or more variables are monitored over time and their interrelationship 
is examined statistically; a large number of observations is needed in order to meet the 
statistical assumptions behind the analysis. These designs originated in econometrics, 
where, for example, the effect of one year’s interest rates on the following year’s 
economic activity may be analyzed using monthly data over 25 years, which yields 300 
data points. 

Gottman and his co-workers have promoted these methods within clinical psy¬ 
chology in general and in the study of psychological therapies in particular (e.g., 
Gottman, 1981; Gottman & Roy, 1990). Complex statistical methods are needed to 
assess the evolving relationships within and between variables (Borclcardt et al., 2008; 
Gottman, 1981). An interesting application was Moran and Fonagy’s (1987) use of 
time-series methods to study the process and impact of child psychoanalysis on an 
adolescent girl with diabetes. They demonstrated an association between certain 
psychoanalytic content themes, for instance, the girl’s anger with her father, and the 
study’s principal outcome variable, variations in her blood glucose level. 


CONCLUSION 

Small-N designs thus represent both a way to look at individual uniqueness and com¬ 
plexity, and also a viable research method for practicing clinicians. Like all research 
methods, they have their strengths and weaknesses. They are good for looking at 
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phenomena in depth, demonstrating that certain phenomena exist, or discontinuing 
theories by providing counterexamples. They are poor at establishing typicalities or 
general laws. 

In line with our methodological pluralism stance, we would argue that a thorough 
investigation of any topic area needs to combine both large-N and small-N approaches. 
It is possible, even desirable, to examine single cases within the context of a larger 
group-comparison study. Rogers’s (1967) classic case of “A silent young man” is 
taken from a larger experimental study, as is Parry et al.’s (1986) case of “The anxious 
executive.” These two examples both give a human dimension that is lacking in the 
predominandy statistical reports from the larger projects. Beyond this, Stewart and 
Chambless (2010) showed that practicing therapists are more likely to be influenced 
by case study data than statistically significant group results. 


CHAPTER SUMMARY 

There is a central distinction between nomothetic and idiographic approaches to 
research. Nomothetic designs look at groups of individuals (see Chapter 8); idio¬ 
graphic designs look at separate individuals in depth. Idiographic designs, often called 
small-N, single-case, or N = 1 designs, can be appealing to clinicians as a way of 
combining research and practice. They derive from several different traditions: narra¬ 
tive case studies in neuropsychology and medicine, operant behaviorism, Shapiro’s 
single-case approach, and idiographic research in personality theory. Athough vari¬ 
eties of case study design exist along a continuum of measurement and experimental 
control, they can be roughly grouped into two main types of design: single-case 
experiments and naturalistic case studies. 

Single-case experiments are most often used within operant behavioral approaches 
to therapy, to demonstrate the intervention’s control over a problem behavior. They 
are characterized by frequently administered measurements and the experimental 
manipulation of an intervention. They have a baseline phase before the intervention 
is introduced; the participant thereby serves as their own control. 

Naturalistic case studies range from narrative approaches, such as Freud’s, to more 
structured studies using systematic measurement of process and outcome. Several 
authors have articulated criteria for increasing the credibility of case studies. A number 
of procedures can be used to demonstrate that client change occurred and that it was 
linked to the therapy. 


FURTHER READING 

Most of the references on single-case experimental designs cover similar ground. 
Barlow et al. (2008) and Kazdin (2011) are the two standard textbooks; there are 
good chapter-length treatments by Gaynor, Baird, and Nelson-Grey (1999) and 
McMillan & Morley (2010). Hayes et al. (1999) and Morgan and Morgan (2001) set 
these designs against a background of scientist-practitioner professional issues, while 
Smith (2012) reviews published research and gives some suggested standards. 
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McLeod (2010) and Yin (2009) discuss case studies as a general research method. 
It is a good idea to read some of the classic narrative case studies, both from a research 
and a clinical point of view. Any of Freud’s are worthwhile: “Little Hans” (Freud, 
1909/1955) or “Dora” (Freud, 1905/1953) provide a good starting point. On the 
behavioral side, there is Watson and Rayner’s (1920) famous (and ethically dubious) 
case of Litde Albert. Carl Rogers pioneered the use of audio recording to study client- 
therapist interaction in single cases: his case of “A silent young man” (Rogers, 1967) 
is an excellent example of a process-oriented narrative case study. Parry et al. (1986) 
and Moran and Fonagy (1987) are interesting examples of case studies using more 
intensive quantitative methods. McLeod (2010) provides a useful summary of a new 
generation of systematic case study methods; Elliott et al. (2009) provide a detailed 
example of one of these, using an adjudicated interpretive approach. 

The Barkham et al. (2010) edited volume on practice-based evidence, in which, to 
declare an interest, the present authors all have chapters, is a valuable collection of 
ways that practitioners can generate useful data from their day-to-day clinical work. 

More information on the measures mentioned in this chapter can be obtained from 
the references cited; several can be obtained from the website of the Network for 
Research on Experiential Therapies (http://experiential-researchers.org). 


QUESTIONS FOR REFLECTION 

1. What, if any, conclusions do you think can be drawn from Freud’s case studies? 
What could he have done to make them more convincing? 

2. Under what circumstances, if any, do you think it is permissible to generalize the 
results of a case study with an individual client beyond the specific individual 
studied? 

3. If you haven’t already done so, carry out the exercise described earlier in this 
chapter in which you compare process notes composed from memory with the 
recording of the session. What kinds of discrepancies are there? 

4. Think about a recent client you have worked with. How could you have docu¬ 
mented what kinds of changes occurred and why? 
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KEY POINTS IN THIS CHAPTER 

• Sampling refers to the process of obtaining the participants for the study. 

• It involves specifying the target population, choosing the sampling procedure, 
and determining the sample size. 

• The external validity of a study is the degree to which its results can be 
generalized. 

• In most types of research, it is important to have an unbiased sample that is 
representative of the target population from which it is drawn. 

• In quantitative research, sample size is determined by statistical power analysis. 

• There are several alternative approaches to sampling, used in small-N 
and qualitative research. 

• Ediical principles are concerned with protecting the rights, dignity, and welfare 
of research participants. 

• The central ethical issues are informed consent, minimization of harm, 
and confidentiality. 


The final aspect of design concerns the participants in the research. It addresses the 
“who?” question that we posed in Chapter 8: who will you be studying, and to whom 
do you intend to apply the findings of the study? We will also consider ethical issues 
here, since they concern the researcher’s relationship with the participants. 

We usually prefer the term “participants” to the old-fashioned, but still current, 
term “subjects.” The latter term, with its monarchic connotations, has undesirable 
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implications of powerlessness and passivity. The stock phrase “running the subjects” 
is especially to be avoided: one of our students once wrote something like “the 
subjects were run in their own homes,” which conjures up an image of indoor jog¬ 
ging. For interviews and questionnaires, you can speak of “respondents” or “inter¬ 
viewees.” (For observational research, the term “observees” has not yet taken off.) In 
ethnography, the term “informants” is typically used, although this has unfortunate 
connotations of surreptitiousness. New paradigm researchers (e.g., Reason & Rowan, 
1981) and participatory action researchers (e.g. Jason et ah, 2004) may use the term 
“co-researchers,” to emphasize the idea of the participant as an equal partner in a 
collaborative research enterprise. 

This chapter has two separate sections: sampling and ethics. We have placed this 
material here, after the chapters on measurement and the first two chapters on design, 
because some of the issues to be considered depend on knowledge of those topics. 
Furthermore, beginning researchers often focus on the population and how it will be 
sampled before they have formulated what they will be studying. However, some of 
the issues covered in this chapter will inevitably need to be thought about during the 
groundwork phase of the project, after choosing the area to be investigated and 
developing the research questions (see Chapter 3). Problems of access to populations 
are bound up in some of the organizational and political issues that we discussed in 
that chapter. At an extreme, if there is no sample available, there is no study. 


SAMPLING 

Sampling refers to the process of specifying and obtaining the participants for the 
study. There are three steps: (1) specifying the target population; (2) choosing the 
sampling procedure; and (3) determining the sample size. Usually the steps are sequen¬ 
tial, though they can be iterative. Sometimes, for example, the sample size can influence 
the sampling procedure. We will deal with each in turn, and then consider some 
alternative approaches. Although we will mainly be using language associated with the 
quantitative research tradition, we intend our discussion to have a general application. 
Qualitative researchers may sometimes be less concerned about representativeness, but 
we contend that all researchers must decide, implicitly or explicidy, how to respond to 
these sampling issues. 

It helps to think in terms of three nested sets (see Figure 10.1): 

• The universe is the broad population to which eventual generalization of the 
findings is desired. 

• The target population is the defined group from which the participants in the 
study are to be selected. 

• The sample is the subset of the target population consisting of those participants 
who actually take part in the study. There may be a gap between the ideal and the 
actual sample: the terms intended versus achieved sample can be used to denote this. 

For example, you may be interested in the prevalence of depression in British women 
who consult their general practitioners (family doctors). In this case, the universe may 
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Figure 10.1 The universe, the target population, and the sample 


be all British women who visit their general practitioners, the target population all 
women who consult 10 specific doctors in September 2015, and the intended sample 
one in 20 of those women. The achieved sample will be the subset of women actually 
interviewed. In the case of a census, the universe, the target population, and the intended 
sample are one and the same (e.g., in a national census they consist of all members of 
that country’s population), although the achieved sample will fall short of this, as some 
people are inevitably missed out. 

A quantitative measurement made in a sample is called a statistic; it is usually done 
to estimate a population parameter. For instance, the prevalence of depression in a 
sample of 163 women visiting their general practitioners is a statistic; this may be used 
to estimate the overall prevalence of depression in the target population of women 
users of those practices, which is a parameter. 

Generaliza bility 

Usually researchers are not just interested in the specific sample itself; rather they want 
to extend the findings to other groups. The extent to which this is possible is referred 
to as the external validity of the study (Cook & Campbell, 1979). External validity is 
captured by the question, “Does it generalize?” or, more fully “To what extent do the 
results of my study apply beyond the specific people, situations or incidents in the 
sample, to others like them?” Of course, generalizability is not just a matter of 
sampling, since it also involves consideration of the setting, the time, the measures, 
and so on. We consider these aspects of external validity later on, when we discuss 
analysis and interpretation in Chapter 12. 

From a purely sampling point of view, there are two types of generalization, 
corresponding to the transitions from one subset to the next in Figure 10.1. The first 
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type is generalizing from the sample to the target population. In quantitative research 
this is known as statistical inference, and there is a well-established set of procedures 
to accomplish it. However, these procedures make certain assumptions, such as unbi¬ 
ased sampling from the target population, which we will examine below. The second 
type is generalizing from the target population to another population or to a larger 
universe. This is done on the grounds of general plausibility, rather than any statistical 
argument. For example, can the results of a study of socially anxious patients seen at 
hospital X in Los Angeles be generalized to hospital Y in New York? To people with 
social anxiety who do not seek help? To other countries or cultures? If these groups 
of people are plausibly similar enough, then the results can be generalized across 
them; if not, then the findings must be considered as specific to the original target 
population until replications in other populations are conducted. 

In the case of qualitative and small-N research, the argument for generalizability 
always depends upon plausibility, in a similar way to the second step of generalization 
in quantitative methods. Some qualitative researchers reject the whole notion of 
generalizability across populations (see Chapter 12). 

The importance of external validity depends on the type of research. Basic research on 
general human processes assumes external validity, since it seeks universal generalizability. 
Applied research may also seek to generalize, though often less widely, for example, to a 
particular client group. For evaluation and action research, external validity is often less 
important, since the research seeks an understanding of, and solutions to, a particular 
problem in a particular setting, and seeks generalizability only to the immediate future. 


The Target Population 

The first step in sampling is to specify the target population. It can be defined in terms 
of, for instance, gender, social class, problem type, or problem severity. The definitions 
are usually phrased in terms of specific inclusion or exclusion criteria. The sample may 
be defined narrowly (e.g., married women aged 35^5 living within the Liverpool city 
boundaries with no significant medical or psychiatric history) or broadly (e.g., all 
British women aged 18 and over). Narrowly defined populations are called homogeneous, 
broadly defined ones heterogeneous. 

Researchers must make a trade-off when deciding upon the breadth of the target 
population. A homogeneous sample has the advantage of reducing the degree of 
extraneous variability (i.e., statistical noise) in the sample, which gives more power to 
detect effects that you are interested in and more precision in estimating the magnitude 
of those effects. In analysis of variance terms, homogeneity reduces the proportion of 
error variance to total variance. For example, if you are researching the influence of 
stressful life events on depression, any relationship will be harder to detect in a more het¬ 
erogeneous sample, since depression is a function of many variables other than life events. 

On the other hand, the increase in precision from a narrow definition of the target 
population is bought at the expense of the following: 

1. There will be reduced generalizability to a larger universe (e. g., if you are studying 
women in their thirties, the findings will not necessarily apply to women of all age 
groups). 
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2. Practical difficulties will result, including the problem that the more stringent are 
the inclusion criteria, the harder it is to find participants, since more people have 
to be screened or you have to get referrals from more specialized services. 

3. Having a homogeneous sample precludes examining individual differences, for 
example, if there is little variability in age within your sample, you cannot look at 
age as an individual difference variable. 


Bias and Representativeness 

In order to make inferences from the sample to the target population from which it is 
drawn, the sample should ideally be an unbiased sample of that population. This means 
that every member of the target population should have an equal chance of being 
selected for the sample. A number of different sampling techniques may be used to 
generate a representative sample (Minlce & Haynes, 2011; Sudman, 1976). For 
example, in probability sampling every member of the target population has a given 
chance, say one in 10, of being included in the study; whereas in stratified sampling, 
the target population is first subdivided into groups, for instance, according to social 
class or diagnostic variables, before making the allocation to the study. 

Psychologists are typically careless about sampling methods: they tend to rely on 
convenience sampling (i.e., whoever they can get) and hope that their results will 
generalize, if the sample is large enough. However, it is wrong to assume that a sample 
large enough to have sufficient statistical power is large enough to ensure generaliz- 
ability. No matter how large the sample is, you can only generalize safely if the sample is 
representative of the target population. A sample of 5000 male college students still 
does not allow you to generalize your findings to a population of female factory workers. 

However, eliminating bias is not always feasible. Even with a well-designed sampling 
plan, there is usually a gap between the intended and the achieved sample. For example, 
research using postal questionnaires often has at least a 30% nonresponse rate (Dillman 
et al., 2009). Nonresponders usually differ considerably from responders, in terms of 
interest, motivation, educational level and so on. Similarly, studies which recruit volun¬ 
teers via advertisements or the internet may get an unrepresentative sample. 

Sometimes, it is possible to estimate the nature of the sampling bias and partially 
control for it statistically when you analyze the data. For example, if respondents are 
older on average than nonrespondents, you can look at the association of age with 
whatever variable you are studying, and possibly use partial correlations to remove its 
influence. However, as we discussed under the nonequivalent groups pretest-posttest 
design in Chapter 8, post-hoc statistical adjustments can only partially compensate for 
a biased sample, because of unreliability of measurement and because you can never 
fully compensate for all possible variables on which bias may occur. Such post-hoc 
analyses are often worth doing, but must be treated with caution. 

Another serious drawback of the convenience sampling approach is that minority 
populations may be underrepresented. For example, Graham (1992) analyzed the 
characteristics of participants in studies published in the major American Psychological 
Association journals. She concluded that all too often papers reported that “most of 
the subjects were white and middle class” and that psychological research has ignored 
black and ethnic minority participants. In the same vein, Arnett (2008) has also 
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pointed out that most papers in these journals focus narrowly on Americans, to the 
exclusion of the rest of the world’s population, and Henrich, Heine, and Norenzayana 
(2010) describe the typical research participant as WEIRD, that is from Western, 
educated, industrialized, rich, democratic societies. 


Sample Size 

From the point of view of inferential statistics, the obvious rule of thumb is that the 
larger the sample is, the better, since you are then more able to separate out the 
variance associated with the effects you are interested in from the variance due to 
errors of sampling and measurement. In other words, with a large sample you are 
more able to separate the signal from the noise. However, as Cohen (1990) has 
pointed out, a sample can be too large, in the sense that it exceeds the requirements 
for statistical power (see below), thus involving a waste of research effort, and it is also 
likely to identify trivially small effects. If you are fortunate enough to be well funded, 
a better strategy may be to carry out several smaller studies on different populations, 
rather than one large one. 

The attainable sample size will also depend on practical issues, such as recruitment 
difficulties, time constraints, finances, and the rarity of the condition studied. 

Statistical Power Analysis 

The main way of estimating the appropriate sample size is known as statistical power 
analysis (Cohen, 1988,1990,1992). In a nutshell, the statistical power of a study is the 
likelihood that it will detect an effect that is actually present, for example, a difference in 
effectiveness between two treatments. It is analogous to the power of a microscope in 
laboratory research. Just as a study using a low-magnification microscope will miss out 
fine details, so a low-power study in psychology will have a low chance of detecting 
subtle effects; conversely a high-power study will have a good chance. Many studies in 
clinical psychology have simply not been powerful enough and thus may have overlooked 
the presence of important effects (Cohen, 1990; Kazdin & Bass, 1989). 

In any study, there are four related parameters. For any given statistical test, if you 
know any three of them, you can calculate the fourth. 

• The sample size (N) is usually what you want to determine, but, if you know it in 
advance, you can calculate the size of effect that the study is powered to detect. 

• Alpha (a) is the probability of detecting an effect when in fact none exists (this is 
called a Type I error ox false positive). In most psychological research, alpha is set 
by arbitrary convention at 0.05, but a more lenient value of 0.10 is sometimes 
used for exploratory research or defining nonsignificant trends. On the other 
hand, more stringent values (e.g., 0.01 or 0.001) may be used to increase the 
confidence in one’s findings or to control for the effects of conducting multiple 
tests of statistical significance. 

• Beta (jd) is the probability of missing an effect which is in fact present (this is 
called a Type II error or false negative). Statistical power is defined as 1 minus 
beta (1 - p): it is the probability of detecting an effect that is really there. 
As Cohen (1988, 1992) recommends, the standard, widely used level for power 
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is 0.80. It is inadvisable to design a study whose power is less than 0.50; you are 
wasting your time and that of your participants if you design a study that has less 
than a 50-50 chance of finding an effect that is present. 

• Effect size is the key concept in power analysis. It is a measure of the strength of 
the underlying relationship that you are interested in. Effect sizes are usually 
talked about in terms of small, medium, and large effects. A large effect can be 
thought of as one that is large enough to see with the naked eye—that is, without 
statistical analysis. The way the effect size is calculated depends on the type of 
statistical methods used in the study (e.g., chi-square, t-test, correlation, or anal¬ 
ysis of variance). This is reviewed by Cohen (1988, 1992), who presents rough 
standards for what amounts to a small, a medium, and a large effect with each 
type of statistical test (see also Vacha-Haase & Thompson, 2004). For example, 
in correlational studies, a Pearson correlation coefficient of 0.10 is considered to 
be a small effect, 0.30 a medium effect, and 0.50 a large effect. Clinical psychology 
researchers usually deal with medium effect sizes, though small effects may be of 
interest in epidemiological research. Note that effect size is not the same as 
clinical significance (see Chapter 12); an effect may be large but trivial, if the 
variable which shows the effect is trivial. 

In order to estimate the required sample size for your study, you need to carry out a 
statistical power calculation. For this, you first need to select your alpha and beta 
levels and establish your effect size. As discussed above, the most commonly adopted 
values are an alpha of 0.05 and a power of 0.80. A rough estimate of the effect size 
can be obtained from previous research or theoretical knowledge of the topic area. 
It is usually worth trying out the calculation for a range of effect size estimates. 

Power analysis tables are provided in various sources. Cohen (1988) and Kraemer 
and Thiemann (1987) give detailed treatments, and Cohen’s (1992) “power primer” 
presents a clear summary of the central concepts and a useful table to calculate sample 
sizes for common designs. There is also software available online to do the calculations 
such as G*Power (Faul, Erdfelder, Fang, & Buchner, 2007), which is freely download¬ 
able, and there is a list of web-based calculators at http://statpages. 0 rg/#P 0 wer. 
Tables 10.1 and 10.2 summarize the sample size estimates for two common statistics, 
t-tests and correlations. For example, in a design which compares two groups using a 
t-test, with medium effect sizes and an alpha of 0.05, a sample of 64 per group is 
needed to attain a power of 0.80; with a larger effect, the required sample size decreases. 
Studies with many variables (e.g., factor analytic studies of long inventories) or many 
subgroups (e.g., norming a psychological test on different subpopulations) require 
larger samples. In fact, the sample size requirements for certain types of research, for 


Table 10.1 Estimated sample sizes for t-tests 


Effect size (Cohen’s d) 

n per group 

total n 

medium (d = 0.5) 

64 

128 

large (d = 0.8) 

26 

52 


Note: alpha =0.05; power = 0.80 
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Table 10.2 Estimated sample 

sizes for correlations 

Effect size (Pearson’s r) 

total n 

medium (r = 0.3) 

84 

large (r = 0.5) 

28 


Note: alpha = 0.05; power = 0.80 


example, comparative therapy outcome research, are so large that we recommend that 
you only conduct such studies if you have the adequate funding and staffing to do so. 


Alternative Approaches to Sampling and Generalizability 

Qualitative research typically uses smaller samples than traditional quantitative research, 
as does, obviously, small-N research. Unsurprisingly, the most common criticism of 
such research is that you cannot generalize the results. In this section, we will describe 
some alternatives to the traditional approach to sampling and generalizability. 

Generalizability through Replication 

A rational (as opposed to a statistical) approach to generalizability and sampling can 
be found in the behavioral N = 1 tradition (see Chapter 9), in which research is carried 
out one case at a time, varying the conditions and relevant client characteristics and 
measuring the effects until you achieve an understanding of the causal relationships 
involved. The relevant characteristics of the case, including any background and 
situational variables that appear to be important, are carefully described. 

In this approach, you then attempt to replicate the first case study by finding a case 
as similar as possible to the first case (this is referred to as direct replication: see Sidman, 
1960). If you obtain different results (i.e., there is a failure to replicate), you try to 
understand what made this case different from the first, and then try to find a case that 
matches the first (or second) on this variable. If the same results are obtained, you 
next begin to vary apparently relevant features of the case in order to establish the 
limits of generalizability in a rational manner (this is referred to as systematic replication). 
Replications establish the breadth or range of generalizability, while failures to repli¬ 
cate establish the limits of generality, just as a control group would in traditional 
research; the two complement each other. Thus, as Cook and Campbell (1979) note, 
external validity is better served by a number of small studies with specified samples 
than by a single large study. Cronbach (1975) refers to this approach as locally intensive 
observation ; Hayes et al. (1999) call it a clinical replication series (which can also be 
thought of as a form of multiple baseline design). 

Bayesian Approach 

Statistical power analysis relies on the traditional null hypothesis testing approach to 
statistics. In contrast, researchers working from a Bayesian approach consider that any 
new data adds to the sum of knowledge. Therefore small-sample, “underpowered” 
studies are not necessarily to be avoided, but it is still the case that the larger the 
sample size, the more the study will add to prior knowledge (Dienes, 2011, 2014; 
Edwards, Lilford, Braunholtz, & Jackson, 1997). 
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Falsificationist Approach 

From a falsificationist framework (cf. Popper, 1963; see Chapter 2), researchers are 
not concerned with representativeness or generalizability, but with looking for 
counterexamples to existing theory. These could consist of a single case (Dukes, 1965; 
Meehl, 1978). For instance, in clinical neuropsychology, a single example of a patient 
with a certain pattern of abilities may invalidate a proposed model of mental structure 
(Shallice, 1988). In these circumstances, qualitative or quantitative descriptive 
research that establishes the existence of the counter-example can be of crucial 
importance. 

Networking or Snowballing 

When the size and composition of the target group is unknown at the outset, it is 
possible to use a sampling procedure known as networking or snowballing (Patton, 
2002; Biernacki & Waldorf, 1981), which operates by asking each respondent to 
name one or two other people who fit the research criteria. Sampling continues up to 
the point where no new respondents are identified or the intended sample size is 
reached. For example, Pistrang (1990) used this method to study the mental health 
needs of London’s Chinese community. She wanted to interview community and 
health workers who were involved with the Chinese population in London’s West 
End. Before the project started, it was not known precisely how many such workers 
there were or where they were to be found, but increasing numbers of interviewees 
were located via networking as the project progressed, up to a final total of 20. 

A potential problem with the snowballing procedure is that the initial respondents 
might direct you to other like-minded people who share their viewpoint, and thus the 
researcher needs to be aware of possible biases in the achieved sample. 

Purposive Sampling 

In qualitative and case study research, the term purposive sampling is often used to 
denote a systematic strategy of selecting the participants according to criteria that are 
important to the research questions. It is similar to specifying the target population in 
quantitative research, in that the researcher attempts to select participants fitting 
specific criteria, but it is a less rigid process, being guided by the researcher’s judgment 
(Robson, 2014). One example is heterogeneity sampling, where the researcher attempts 
to recruit participants with a broad range of demographic characteristics, for example, 
for members of a focus group in qualitative research. 

Theoretical Sampling 

In grounded theory, the sampling approach is referred to as theoretical sampling (Corbin & 
Strauss, 2015). It is a type of purposive sampling, in which the researcher’s emerging 
theory determines the sampling strategy as the study develops. In this approach, you need 
to start analyzing your data early on in the data collection process. Theoretical sampling 
resembles the replication sampling approach of the behavioral single-case researchers. 
The difference is that the behaviorists are trying to establish control over behaviors, while 
the grounded theorists are trying to develop a rich description and test emerging theory. 

The procedure is that the researcher analyzes the data as they are collected, and 
develops tentative theoretical concepts from early on in the study. As ideas form about 
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what the important dimensions or conditions are, the sampling strategy is modified to 
take these into account. For example, in a study of postpartum depression, the 
researcher may theorize, after interviewing several women, that an important aspect is 
the degree of control that the women experienced over their childbirth. She may then 
sample women who had different types of childbirth procedures, in order to examine 
variations in perceived control and its consequences. 

In grounded theory, sampling stops when litde new information emerges and a rich set 
of categories has been developed. This is referred to as saturation (Corbin & Strauss, 
2015), a useful concept that applies to sampling within a wide range of qualitative methods. 

Internet Sampling 

The internet provides a convenient way for researchers to recruit participants and also 
to carry out research procedures, such as surveys, experiments, and interviews. 
Recruitment can be via websites, targeted emails, or social media. It is particularly 
useful for recruiting people experiencing rare conditions (Birnbaum, 2004). 

Some concerns have been raised about the quality of samples gathered via the 
internet: potential problems are unrepresentative samples, fraudulent data, and mul¬ 
tiple responding (Kraut et al., 2004; Wright, 2005). It is also important to remember 
that a digital divide still exists, with economically disadvantaged members of the 
population being less likely to have internet access (Dillman et al., 2009). However, 
the emerging evidence appears to indicate that the characteristics of internet samples 
are largely comparable to those in studies reported in mainstream journals, using sam¬ 
ples recruited offline (Gosling, Vazire, Srivastava, & John, 2004). 


Summary and Conclusion 

The essential point is that researchers need to think carefully about whom the con¬ 
clusions of their study can apply to and how they are going to support the strength 
of those conclusions. All too often, clinical psychology researchers seem to neglect 
sampling and generalizability issues. Unfortunately, there is a long tradition of 
clinicians making overconfident generalizations based on observations of the biased 
sample of clients who have appeared in their consulting rooms. Freud’s case his¬ 
tories were partly responsible for this, as modesty in drawing inferences was not 
one of Freud’s characteristics. Late-19th-century neurotic Viennese women seeking 
psychoanalysis are not a good foundation on which to base general theories about 
the human condition; or, more precisely, it is possible to form one’s theories with 
such a population, but they must be replicated in other ways if they are to have 
credibility. Clinicians often seem unaware that people who seek professional help 
for their psychological problems are in a minority (e.g., Wang et al. 2005). Thus 
clinical researchers need to develop more humility about the limits of application 
of their findings. 

True random sampling, in the sense of drawing participants randomly from a large 
population of potential participants, is rarely performed in clinical psychology research. 
Usually, convenience sampling is used - that is, whoever can be obtained at the time of the 
study (e.g., all the participants who can be recruited in a given time period). Researchers 
need to take this into account when analyzing the data and making generalizations. 
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Having dealt with sampling issues, we will now examine the other major topic area 
that is raised by working with the participants, that is, ethics. 


ETHICAL ISSUES 


The major ethical principles in clinical psychology research are: 

• Informed consent. The researcher gives full information about the study and 
participants freely choose whether to enter it. 

• Avoidance of harm. Harm may be direct (such as stress or humiliation) or 
consist of deprivation of benefit (such as in control groups in clinical trials). 
There may be a difficult trade-off between the potential harm to individual 
participants and the potential benefits of knowledge to humanity. 

• Privacy is the right not to provide information; confidentiality is the right to 
have any personal information kept securely. 

• All clinical psychology research should be reviewed by experienced researchers 
external to the project, including ethics committees or Institutional Review 
Boards. 


Ethical principles are concerned with protecting the rights, dignity, and welfare of 
research participants. Interest in the ethics of psychological research grew out of outrage 
at earlier abuses, including medical research in Nazi concentration camps during World 
War II. The 1947 Nuremberg Code and the 1964 Declaration of Helsinki set out the 
ethical principles by which medical (and psychological) research is now governed. 

Early stress-induction research by psychologists also caused ethical concerns 
(Bersoff & Bersoff, 1999). These concerns were further fuelled by the widespread use 
of deception in the social-psychological research of the 1950s and early 1960s, which 
shaped the public attitude toward psychologists as scientific deceivers. The civil rights 
movement and populism of the 1960s and 1970s resulted in a greater sensitivity to 
ethics on the part of psychologists (Imber et al., 1986; Korchin & Cowan, 1982). 
Finally, concerns about litigation and the general trend toward increased bureaucrati¬ 
zation and governmental control of research led in the 1970s and 1980s to government- 
mandated practices for the review of research involving human participants. In the 
United States, there is a “Federal policy for the protection of human subjects” (the 
“Common Rule”: Department of Health and Human Services, 2014); in the United 
Kingdom, the government has set out a comprehensive “Research Governance 
Framework” (Department of Health, 2005) covering all aspects of the conduct of 
research on health and social care in public settings. 

Previous chapters have touched on some ethical issues associated with particular 
research methods or designs, such as covert observation or no-treatment control 
groups. Here we will examine some central principles common to all psychological 
research. Following Korchin and Cowan (1982), we group them under the headings 
of: (1) informed consent; (2) minimization of potential harm/ deprivation of benefit; 
and (3) confidentiality and protection of privacy. 
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However, before examining these principles, some general points need to be made. 
First, the researcher is under an obligation to explore and seek others’ advice and 
judgments about the specific ethical issues involved in his or her study. Second, as 
Korchin and Cowan (1982) noted, validity and ethics should not be seen as separate 
issues. Instead, unethical practice reduces the external validity of the research, because 
it results in research procedures that cannot be translated into practice. Conversely, 
poorly designed research reduces the ethical standing of the research, because, in such 
situations, there are usually only minimal scientific or social benefits possible to coun¬ 
terbalance the possible risks or costs of participation in the research. Finally, it is worth 
noting that we are operating in the domain of value judgments, in which one needs to 
balance negative effects (usually accruing to the participant) with positive effects 
(usually accruing to society in general). Sometimes there are conflicting ethical consid¬ 
erations, and difficult choices need to be made for which there are no clear-cut answers. 


Informed Consent 

Informed consent involves disclosure by the researcher, before the study, of what will 
happen during the study and of any other information that might affect the person’s 
decision to participate. This enables prospective participants to make a free and 
informed decision about whether or not to enter the study. Thus informed consent 
consists of both full information and freedom of choice. 

Full Information 

Full information refers to the principle of telling prospective participants everything 
that they need to know in order to make a rational decision about whether to take 
part in the study. An important corollary is that the participant is able to understand 
the information provided (i.e., that it is not written in overly technical or bureaucratic 
prose or in a language in which the participant is not fluent). 

Problems arise when the person’s understanding of the issues is limited. Informed 
consent becomes difficult with children or with adults who are not fully competent to 
make their own decisions (Bersoff, 2008; Bersoff & Bersoff, 1999), or even with well- 
informed and educated adults in clinical trials in medicine (Thornton, 1992). 
For example, if the child is under 7, parental permission plus the child’s verbal agreement 
is usually required. If the child is between 7 and 17, then his or her written assent is 
usually required in addition to parental permission. Similarly, with adults with severe 
dysfunctions (e.g., people with severe mental retardation - intellectual disabilities in UK 
usage - or people who are psychotic), then sensitivity and clinical skills are required, and 
the level of readability and comprehensibility of the description is important. 

A second issue is the role of deception in psychological research. Although it is 
much less common in clinical than in social psychology, there are some well-known 
examples of deception, such as Rosenhan’s (1973) “pseudopatient” study, in which 
participant observers faked a psychotic symptom in order to gain admission to a 
mental hospital as a patient. There is also the less dramatic issue of deception by 
omission: good scientific practice dictates that participants should not be aware of the 
hypotheses under investigation, since this knowledge may cause them to alter their 
behavior. Thus deception is a matter of degree, ranging from relatively trivial instances 
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of withholding information about specific hypotheses or naturalistic observation of 
public behavior, to more serious cases of lying to the participants. Deception is an 
especially serious problem when the study uses fictional environments or contrived 
situations (e.g.,in Good Samaritan studies when a serious crime or accident is feigned), 
or when double deception (i.e., false debriefing) is used. 

At a minimum, a full debriefing is needed at the end of any study in which deception 
is used, in order to provide complete information, including the rationale for the 
deception, and to answer all questions about the study. However, debriefing cannot 
always be relied upon to undo the effects of the deception, because this may cause 
greater pain when the participants learn that they have been deceived. For this reason, 
Korchin and Cowan (1982) recommend that alternative methods be used wherever 
possible, including obtaining the person’s consent to be uninformed, seeking feedback 
from surrogate participants who are similar to proposed participants, role playing and 
simulation research, and naturalistic, descriptive research. 

Freedom of Choice 

Freedom of choice requires that the participant’s consent be voluntary, without direct 
or indirect pressure to take part. There should be no coercion, explicit or implicit. Thus, 
the researcher must foster the possible participant’s autonomy and self-determination 
and should evaluate implicit situational or personal factors that may limit freedom. 

There is often a considerable power imbalance between the researcher and the 
potential participant. In this case, the problem of making sure that there is no implicit 
coercion becomes acute. This is often an issue in clinical settings, where a therapist or 
doctor wishes to conduct research with his or her patients, who may fear that refusal 
will prejudice their treatment. It is also an issue with “captive” populations such as 
psychiatric inpatients, prisoners, or students or where there is a shortage of mental 
health services. For example, one of us was required by an ethics committee to destroy 
1000 promotional leaflets for our research clinic because the title said “Free Therapy”; 
it was all right for the service to be free of charge, but the prominent presence of the 
word “free” was seen as potentially coercive for clients facing limited services and long 
waiting lists. Clearly, power imbalances inevitably limit freedom. 

Informed Consent Form 

In practice, the study is described and the participant’s consent is recorded by means 
of an information sheet and informed consent form. Although specific requirements 
vary (depending on the particular study, the setting in which it is conducted, and the 
type of ethics committee), at a minimum these should contain: 

• a description of the study’s procedures; 

• an explanation of its risks and potential benefits; 

• an offer by the researchers to answer questions at any time; 

• the statement that participants may withdraw their consent at any time during the 
study without prejudice, especially without prejudice to their present or future 
treatment; and 

• a space at the end of the form for the potential participant to sign in acknowledgement 
that they understand what the study involves. 
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The informed consent form is given to participants to read and sign after the study 
is fully described to them and after they have had a chance to ask any questions about 
it, but before the study proper begins. It is good practice to give participants a 
duplicate copy of the information sheet to retain for their records. With treatment 
studies or more involved research it is also a good idea to allow participants time to 
reflect on whether they wish to take part or not. 


Harms and Benefits 

In general, research should not harm the participants. In ethical theory this is referred 
to as the principle of nonmalificence. However, some people may freely consent to 
expose themselves to potential harm for the greater good of humanity, for example, 
in testing new medical procedures. There is a trade-off between any harm caused to 
the participants versus the potential gain to humanity from the knowledge acquired, 
and the sense of altruism that goes along with that. 

Harm can be either direct , from harmful psychological or physical events, or indirect , 
from withholding of benefits (such as deprivation of treatment in a control group). 
In psychological research, direct harm is most likely to come from such things as 
stirring up painful feelings or memories, threats to one’s self-image, and humiliation. 
Two extreme examples are Milgram’s (1964) obedience studies, in which participants 
believed themselves to be giving dangerous electric shocks to other participants, and 
Zimbardo’s (1973) prison simulation, in which college students, role-playing prison 
guards, brutalized other participants who were role-playing prisoners. In addition to 
psychological risk to the individual, there is also the possibility of social risk, for 
instance, to members of ethnic or cultural groups who may be harmed by the findings 
of studies examining group differences (Scarr, 1988). In all cases, researchers need to 
set up procedures to ensure that the possibility of harm is minimized and to respond 
to any harm that may occur during the study. 

As part of debriefing the participants after the data collection, you should ask 
whether they experienced any distress or had any concerns during the study. 
Furthermore, if the respondent becomes upset during the study itself, you may 
need to terminate, or at least suspend, data collection. Your clinical skills become 
useful here, both in detecting the presence of distress and also in being able to 
respond to it appropriately. However, in some cases participants may need to be 
referred to sources of help outside of the study, for example, if an interview about 
psychological trauma stirs up painful memories, or if a study of marital interaction 
brings out conflict in the couple that they weren’t fully aware of. Occasionally, 
people may volunteer for psychological studies in order to find a way of getting 
help for their difficulties, and may feel let down when they don’t receive benefits 
they had hoped for. 

Withholding of Benefit in Clinical Trials 

Randomized controlled trials (RCTs) highlight a number of ethical issues (Imber et al., 
1986). Although participants are unlikely to be harmed, there are several dilemmas 
about withholding of benefit (which in ethical theory violates the principle of beneficence). 
In other words, there are tensions between the clinical perspective, which emphasizes 
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doing the best for each individual patient, and the scientific perspective, which emphasizes 
having a well-designed study. The researcher must therefore balance the need for useful 
knowledge about a treatment’s efficacy against the likely consequences for individual 
participants, who may receive less than optimal treatment in the study. These tensions 
appear in the following areas: 

• Control and comparison groups. No-treatment or placebo controls mean that some 
patients are deprived of a potentially valuable treatment, being instead given an 
inferior treatment or none at all (see Chapter 8). Wait-list controls pose a less 
serious problem, but they still mean that, for some patients, treatment is delayed. 
Ideally, RCTs are only conducted when there is equipoise (rough perceived 
equivalence) between conditions, or a lack of prior knowledge about their 
equivalence: if the researcher thinks that a particular treatment condition is clearly 
better for a patient, their clinical duty is to give the patient that treatment and not 
enter them into the RCT. Many contemporary RCTs test the treatment of interest 
against treatment as usual (TAU), or the best available treatment, but in practice 
this can be a disguise for a no-treatment condition 

• Specified treatments versus clinical judgment. Patients in RCTs receive specified, 
pre-determined treatments, often manualized ones, thereby diminishing the 
capacity of the clinician to make informed judgments about how the patient is 
responding, and to vary the therapy accordingly. (On the other hand, a certain 
degree of flexibility is built into many contemporary treatment manuals.) 

• Randomization. As we discussed in Chapter 8, Brewin and Bradley (1989) argued 
that many patients have preferences about which treatments they want, and that 
the act of random assignment to experimental conditions deprives them of choice. 
Being in a less preferred treatment (even though they have consented to the lack 
of choice) may result in a less than optimal outcome. 

• Narrow inclusion criteria. Since clinical trials usually have specific inclusion cri¬ 
teria, often based on a single DSM diagnosis, people with significant clinical con¬ 
cerns may not be admitted into the treatment program, on the grounds that their 
problems are too complicated. 

• Referrals at termination. In normal clinical practice, a therapist can refer a patient 
for further help at the end of therapy. However, in an RCT which has a follow-up 
assessment, this may be discouraged, as the researchers need to see how the 
patients fare without any additional therapy. Again, patients are deprived of 
optimal treatment as a result, and it may not be ethical practice. 


Privacy and Confidentiality 

Invasion of privacy and loss of confidentiality are special cases of harm. Privacy refers 
to the person’s right not to provide information to the researcher, while confidentiality 
refers to the person’s right (and the researcher’s corresponding obligation) to withhold 
information from third parties. 

In a trivial sense, all psychological research invades privacy, since otherwise it would 
not be finding out anything new. However, the ethical issue of privacy is concerned 
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with the intrusiveness of research. Different people have different personal boundaries: 
some do not mind disclosing intimate information about themselves, while others 
want to maintain a tight control on what is known about them. The researcher needs 
to be aware of each participant’s limits on disclosing information and respect their 
right to withhold certain information. 

Types of confidentiality protection include anonymity, in which no identification is 
possible, and the more usual situation of protecting the participant’s identity through 
secure research codes that are separated from the data itself. It is also important that 
data be securely stored, using such procedures as password protection or encryption, 
and also avoiding insecure means of transferring data, such as emails and unencrypted 
memory sticks. Participants are likely to be more open and to provide better data if 
they feel assured of confidentiality safeguards. Finally, it is important to keep in mind 
that no confidentiality guarantee is absolute, in that research records are always 
vulnerable to hacking, theft, or legal subpoena. 

Ideally, the information sheet or informed consent form should specify who will 
have access to the data and the findings. (As an aside, the adjective “strict,” which often 
precedes “confidentiality” is superfluous, since something is either confidential or it is 
not.) When audio or video recordings are made, it should be clear who will hold them, 
for what purposes, and for how long; it is good practice to have a separate informed 
consent form to cover consent to make, retain, and possibly publish extracts from 
recordings, or use them for teaching or publicity. When case material is written up, the 
participants’ personal details should be altered so that they are unrecognizable (this 
sometimes requires creativity). However, no guarantee of confidentiality can ever be 
absolute. Just as in clinical practice, the researcher has a duty of care, and if they suspect 
that there will be potential harm to the participant or to others, or if they become 
aware of serious clinical malpractice, then confidentiality may need to be broken. 

The issue of confidentiality becomes increasingly critical as the information becomes 
more sensitive or potentially damaging, should it become known to others. The kinds 
of danger from potential breaches of confidentiality include embarrassment, loss of 
employment, legal action, labeling, and social stigma. In these situations, the 
researcher should give details on the information sheet or informed consent form 
about the kinds of information that the participants will be asked to provide. 


Ethics Self-study Exercise 

We recommend that the researcher review his or her study early on, in order to 
appraise its risks and benefits (Davison & Stuart, 1975). This self-appraisal begins by 
asking: “What risks are possible? How serious are they? How likely are they?” 

The risk estimates typically increase when new procedures (i.e., new measurement 
or intervention methods) are employed, as opposed to established, tested procedures. 
Another important situational factor is the degree of coercion. The researcher should 
ask “What obvious or implicit pressures are operating on prospective participants, 
which may prevent them from refusing to take part?” These may include the need for 
psychological or medical treatment, in order to impress legal authorities or to be 
released from prisoner or patient status. 
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Having evaluated the study’s risks, the researcher should then ask “What bene¬ 
fits are likely? For whom? How realistic are they?” Some benefits may accrue 
directly to the participant, including help with problems, self-knowledge or 
growth, general education, and increased self-esteem or altruism; other benefits 
are more general, such as the knowledge gained and the increased potential for 
helping others. 

In general, greater potential risks, lesser benefits, unknown procedures, and 
coercive situations call for stronger safeguards for informed consent and participant 
safety. These safeguards include greater disclosure of risks; careful screening and 
exclusion of at-risk participants; supervision and monitoring of the participant’s 
condition during the course of the study; and the use of contingency plans for 
removing participants from the study and finding appropriate treatment for research- 
induced problems. Finally, Davison and Stuart (1975) argue that there are some 
situations in which it is impossible to conduct ethical research. Prisons, for instance, 
can be said to be inherently coercive to such an extent that no valid consent can be 
obtained; however, on the other hand, it may be unethical for such institutions to go 
unresearched. 

In evaluating the risk-benefit ratio, be aware of the dangers of self-deception: 
there is a tendency for researchers to rationalize and underestimate research risks 
while overestimating benefits, under implicit assumptions such as “The ends justify the 
means” and “What is good for me must be good for psychology.” You will ultimately 
have an easier conscience if you follow the precept that “People are more important 
than data.” 


Ethics Committees 

You cannot do psychological research without coming into contact with the committee 
delegated by your university, hospital, or other agency to review the ethical treatment 
of human participants in research (Bersoff, 2008; Bersoff & Bersoff, 1999; Ceci, 
Peters, & Plotkin, 1985). These committees are known as Institutional Review Boards 
(IRBs) in the United States and ethics committees in the United Kingdom. The pur¬ 
poses of this review process are to protect the participants in the research, and also to 
protect the institution from legal reprisals for ethical lapses and harm done to research 
participants. Another purpose is to comply with the regulations of grant-giving 
institutions. 

Ethics committees are typically made up of academics, drawn from a range of 
disciplines, medical doctors, and lay members. Many committee members may be 
unfamiliar with psychological research. In the United States, their make-up is dictated 
federally, including a balance of gender and scientific discipline and the inclusion of 
physicians and lay persons from the community. This range of backgrounds usually 
provides a breadth of perspectives to evaluate the ethical appropriateness of the 
research. However, occasionally, ethics committees appear to exceed their brief, and 
to make decisions on political rather than ethical grounds (Ceci et al., 1985). 
For example, we know of one project, which aimed to examine how much psychiatric 
patients knew about the side effects of their psychotropic medication, that was refused 
ethical permission. It seemed that this was not because the project was unethical, but 
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rather because certain committee members felt threatened by what the results might 
say about the state of professional practice. 

Committees can sometimes take months to process an application, so it is 
wise to apply early, especially if your research is being done to a tight deadline (this 
particularly applies to student projects). However, there is a dilemma here, since 
if you apply for ethical approval early in the planning stage, before your protocol 
is finalized, your application may look less polished and your study may also 
change somewhat after it has been approved. The application has to be carefully 
thought through before submission. If your protocol subsequently changes, for 
example, as a result of pilot studies, you can file an amendment with the ethics 
committee or IRB. 

There are often three levels of review: exempt, expedited, and full review. 

Exempt status. A study may pose such minimal risks as to be exempt from regular 
review. Such research includes: (1) Surveys using interviews or questionnaires, where 
the participants are not identifiable or are not asked to reveal sensitive information of 
a personal or potentially damaging nature; (2) Research on established educational 
practices, where the participants are not at risk and are not identifiable; (3) Research 
using existing archival or public data, where the participants cannot be identified; and 
(4) Overt observation of public behavior, under the same conditions of confidentiality 
and unintrusiveness. 

The catch with exempted review status is that you are not allowed to make this 
decision yourself (because of possible vested interests). Thus, there is usually some 
form of screening required to determine whether a study should be exempted or not. 
A typical procedure for doing this is to consult with the ethics committee chair or 
one’s departmental review committee. 

Expedited review. The next level of review is expedited, a fast-track review process 
for low-risk studies. Examples include the use of archival data where a particular use 
of the data has not previously been consented to; and non-stress-inducing behavioral 
research without manipulation of participants’ behavior or emotions. In expedited 
review, the researcher still submits an application to the committee, which may sub¬ 
ject the study to limited review by a subcommittee (e.g., the chair plus one other 
committee member). 

Full review. The third level is full review, which applies to everything that does not 
fit the exempt or expedited criteria, to all government grants, and to all research with 
people who are not competent to give informed consent, such as adults with cognitive 
impairments, and children and adolescents. Often, the researcher may be requested to 
meet with the committee to answer questions about the study. 

Some research practices, such as deception and covert observation, raise red 
flags and are usually scrutinized carefully by ethics committees. These practices 
have a number of potential costs (Bersoff & Bersoff, 1999), including the fact 
that they tend to undermine trust in psychology; they may change people’s 
behavior (e.g., decreasing bystander intervention in emergencies because people 
now think it might be an experiment); and their artificiality may yield distorted 
findings of low external validity. Finally, proposals for work in socially sensitive 
areas (Sieber & Stanley, 1988), such as child sexual abuse, are more thoroughly 
scrutinized. 
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CHAPTER SUMMARY 

This chapter has covered issues concerning the researcher’s relationship with the 
participants: how participants are obtained for the study and how they are treated 
once they are in it. The process of obtaining the participants is known as sampling. 
It involves specifying the target population from which the sample will be drawn, 
choosing the sampling procedure, and determining the sample size. It also involves 
thinking about the universe to which the results of the study are intended to be 
applied. The degree to which a study’s results can be generalized is known as external 
validity: it is one of Cook & Campbell’s (1979) four validity types. 

In most types of research, it is desirable to have an unbiased sample, representative 
of the target population from which it is drawn. There are various sampling techniques 
available to achieve this. However, psychological research often tends to rely on 
convenience samples, which may introduce biases. In traditional quantitative research, 
the sample size is determined by statistical power analysis. There are alternative 
approaches to sampling, such as systematic replication in N = 1 research and purposive 
or theoretical sampling in qualitative studies. 

Ethical principles are concerned with protecting the rights, dignity, and welfare of 
research participants. The central ethical issues are informed consent, minimization of 
harm, and confidentiality. Informed consent has two components: that the researcher 
gives full information about the study and that participants are able to freely choose 
whether to enter it or not. It is important to be aware of the subtle pressures on 
people to participate, particularly when the researcher is in a position of power or 
authority. In clinical psychology research, harm may be direct (such as stress or embar¬ 
rassment) or it may consist of deprivation of benefit (such as when participants in 
control groups in clinical trials get an inferior treatment, or none at all). There may be 
a difficult trade-off between potential harm to individual participants and potential 
benefits of knowledge to humanity. Privacy and confidentiality are special cases of 
potential harm: privacy is the individual’s right not to provide information; confiden¬ 
tiality is the right to have any information kept securely. 

All clinical psychology research should be reviewed by peers, including ethics 
committees or Institutional Review Boards. 


FURTHER READING 

Sudman’s (1976) book, Applied Sampling , though dated, is still a useful resource, and 
Minlce and Haynes (2011) have a more recent, chapter-length treatment. There is an 
accessible summary available on the internet from the UK National Audit Office 
(http://www.nao.org.uk/wp-content/uploads/2001/06/SamplingGuide.pdf). 
Cohen (1990) gives a good overview of the issues in statistical power analysis 
and Cohen (1992) provides a “power primer” covering the most commonly used 
cases. Alternative views of sampling and generalization are covered in Patton (2002), 
Sidman (1960), and Corbin and Strauss (2015). 

Researchers should familiarize themselves with their relevant set of ethical principles 
(e.g., American Psychological Association, 2002, 2010a; British Psychological Society, 
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2010). Bersoff’s (2008) comprehensive edited volume gives a general overview of 
ethics in psychology, with useful case vignettes. The chapters by Korchin and Cowan 
(1982) and Bersoff and Bersoff (1999), in successive editions of Kendall, Butcher, 
et al.’s Handbook of Research Methods in Clinical Psychology , both give interesting dis¬ 
cussions of ethics in clinical research; we have drawn on them heavily here. 


QUESTIONS FOR REFLECTION 

1. What sampling strategies make the most sense for your topic area? 

2. Discuss the social or ethical consequences of carrying out underpowered research. 

3. Carry out the risk/benefit assessment exercise for your project (see Ethics Self- 
Study Exercise) and reflect on what you learned from the process. For example, 
what changes to your study would you make as a result of the exercise? 

4. An emerging area of research ethics is social costs. Comment on: (a) ethical issues 
involved in research that may be damaging to particular groups of people (e.g., 
based on ethnicity, gender, etc.); (b) the risks of excessive regulation of research 
stifling scientific creativity and spontaneity. 
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KEY POINTS IN THE CHAPTER 

• Evaluation is applied research that aims to assess the worth of a service, often 
judging it against specified goals. 

• It includes service audit, quality assurance, needs assessment, and outcomes 
evaluation. 

• Organizational and political issues are crucial in evaluation research. 

• The evaluator begins by asking two basic questions: 

• “What is the service trying to do?” 

• “How will you know if it has done it?” 

• Evaluations can address the process or outcome of a service. 

• Process evaluation examines who is coming to the service and what service 
they are being given. 

• Outcome evaluation examines the impact of the service—whether users 
benefit or not. 

• In addition to examining clinical outcome, evaluators may look at client 
satisfaction, and at economic indicators of costs and benefits. 


In everyday parlance, evaluation means judging the worth of something. Good applied 
psychologists do this informally: they build up a personal knowledge base of which 
interventions work best with whom. Clinical psychology training, in particular, empha¬ 
sizes a reflective, self-critical attitude towards one’s work, and encourages evaluation of 
one’s own practice. 

Here we will use the term “evaluation” in a more formal sense, to denote applied 
research into the implementation and effectiveness of clinical services. 


Research Methods in Clinical Psychology: An Introduction for Students and Practitioners , 
Third Edition. Chris Barker, Nancy Pistrang, and Robert Elliott. 

© 2016 John Wiley & Sons, Ltd. Published 2016 by John Wiley & Sons, Ltd. 
Companion Website: www.wiley.com/go/barker 
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Much of the early evaluation work was done in the United States in an educational 
context, where it is known as program evaluation. It arose as a way of monitoring the 
federal money spent on large-scale social programs in the 1950s and 1960s, such as 
Head Start, a preschool educational intervention program (Rossi et ah, 2004; Shadish, 
Cook, & Leviton, 1991). 

This chapter departs from our chronological, research-process framework. We 
have so far concentrated on fundamental issues in research methods, which can be 
applied across different content areas of psychological research. This chapter draws 
on ideas from the groundwork, measurement, and design chapters, and applies 
them to the task of studying specific services in specific settings. Evaluation is a 
messy area in which sociopolitical and organizational issues are often as prominent 
as scientific ones (Cowen, 1978; Rossi et al., 2004; Weiss, 1972). The design com¬ 
promises that we discussed in Chapter 8 become more acute here. Evaluation 
researchers often face a Hobson’s choice: they can either collect inadequate data or 
no data at all. 

We are devoting a separate chapter to evaluation both because it has its own 
distinct body of literature and because we anticipate that many readers will never 
conduct basic clinical research, but may well become involved in evaluation 
research. We argue that evaluation should be a routine part of clinical psychology: 
much clinical work is based on custom and practice rather than any formal 
knowledge base, and evaluating it is a way of seeing whether or not it lives up to its 
claimed benefits. 

Planning an evaluation begins with two questions: “What is the service trying to 
do?” and “How will you know if it has done it?” The procedures used in evaluation 
research aim to answer these basic questions. This chapter looks at the practical issues 
in incorporating evaluation into working clinical services. Before that, however, we 
will examine some of evaluation research’s basic concepts and specialized vocabulary. 


What is Evaluation? 

We defined “evaluation” above as a form of applied research. However, as we dis¬ 
cussed in Chapter 2, the distinction between pure and applied research is better 

regarded as a continuum rather than a dichotomy. Evaluation, at the applied end 

of the continuum, differs from pure research in several ways (Hayes et al., 1999; 

Pawson & Tilley, 1997; Weiss, 1972) 

• Its primary aim is to assist decision making, rather than to add to an existing body 
of knowledge. Thus it tends to be less concerned with theory and more with solv¬ 
ing a particular setting’s operational problems. 

• It is done on behalf of a decision-maker, often a manager, who may be distinct 
from the evaluator. 

• It takes place in a complex “action setting” (Weiss, 1972), as opposed to a more 
controlled academic research environment. 

• Its participants are usually users of the service, rather than research volunteers, 
and their interests as clients are paramount. 

• It is intended to be used soon, and is usually done under time pressure. 
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• It is often written up for purely local consumption, rather than for wider dissemi¬ 
nation in professional journals. This may be because it may not meet exacting 
scientific standards, because the results are not generalizable beyond the particular 
service being evaluated, or because the time and effort needed to write up the 
findings for publication may be beyond the evaluator’s resources. Sometimes, 
also, evaluators are unable to publish their findings because the people who 
commissioned the study do not want its results to be known by their competitors 
or by the general public. 

Types of Evaluation 

Evaluation has its fair share of jargon, and several terms are used to describe the type 
of evaluation being conducted. 

Scriven (1972) classified evaluation into formative and summative approaches. A 
formative evaluation is typically used for internal program purposes, and feeds back its 
results to influence the service as it continues to develop (or form itself). A summative 
evaluation provides an overall summary, typically for administrative purposes; it is often 
done on a larger scale with its results delayed until after the end of the evaluation 
period. Formative evaluations thus lend themselves to evaluating new services, while 
summative evaluations lend themselves to well-established ones. 

Donabedian (1980), a key figure in the quality assessment literature, distinguished 
three different foci of evaluation: (1) structure refers to the resources that are available for 
a service, such as staff, buildings, and equipment such as psychological tests; (2) process 
refers to the activities that constitute the service delivery—in psychology these are 
essentially a series of help-intended conversations or assessment procedures; (3) outcome is 
how the service affects the clients, for example, how they change psychologically as a 
result. The parallel concepts of input, activities, and output, which originated in economics, 
are also sometimes used (Fenton Fewis & Modle, 1982). The present chapter mostly 
addresses process evaluation: evaluation of structure is psychologically uninteresting 
(except from an organizational development point of view), and outcome evaluation 
overlaps considerably with our earlier discussion of design (Chapter 8). 

The variables to be examined in an evaluation can also be conceptualized using 
Maxwell’s (1984, 1992) widely cited list of six criteria for quality assessment: access to 
services, relevance to need, effectiveness, equity, social acceptability, and efficiency/ 
economy. For example, Parry (1992) used this framework to address how psychotherapy 
services might be evaluated. 

Evaluation, Audit, and Quality Assurance 

The term clinical audit (or service audit) is current in the United Kingdom (Benjamin, 
2008; Cape & Barkham, 2002; Crombie, Davies, Abraham, & Florey, 1993). Its his¬ 
tory in medicine stretches back to the beginning of the 20th century (Fembcke, 
1967; Young, 1982). Audit is a loosely defined term that refers to an intensive exam¬ 
ination of one or more aspects of a service. For example, an out-patient psychotherapy 
service might audit the ethnic background of its referrals. An audit can be specific, as 
in this example, or it can be more wide ranging. 

Definitions of audit tend to emphasize comparison against an agreed standard (e.g., 
Cape & Barkham, 2002; Crombie et al., 1993). For example, an audit of waiting 
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Compare practice 
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practice 


Figure 11.1 The audit cycle 


times in an out-patient service might involve the standard that all clients should 
receive an appointment within six weeks of referral. Under this definition, simply 
monitoring practice prior to a standard having been set, or developing a standard 
itself, are important precursors to audit, but not audit proper. 

Audit is often depicted as a circular process—known as “the audit cycle” 
(Figure 11.1)—which emphasizes using audit to make changes, either in clinical 
practice or in the standards governing practice (Benjamin, 2008). Audit thus 
involves a continuous process of evaluating, feeding back, making changes, and 
evaluating again. 

Both audit and evaluation are closely linked to quality assurance , which emphasizes 
setting-up procedures to ensure that the standard of a service’s work remains consis- 
tendy high (Cape & Barkham, 2002; Green & Attkisson, 1984; Young, 1982). 
Quality assurance is related to various quality management methods originating from 
business and industry, such as total quality management, statistical process control, 
and continuous quality improvement (Cape & Barkham, 2002). Methods for quality 
assurance in clinical psychology could include establishing clinical practice guidelines 
(Parry, Cape, & Pilling, 2003), peer review, and systematic involvement of service 
users in monitoring delivery. 

Audit and evaluation are retrospective, looking at the service after it has hap¬ 
pened (although the results will naturally be fed back to help improve the service). 
Quality assurance, on the other hand, is essentially prospective, ensuring that no 
future problems occur in the service (although it is also retrospective in the sense 
of identifying problems and making sure that they do not happen again). To take 
a hypothetical example from manufacturing, where much of this language origi¬ 
nated, evaluation (or audit, or quality inspections) will count the number of bugs 
in the baked beans; quality assurance procedures will try to stop them getting in 
there in the first place. 


The Sociopolitical Context 

It is vital never to underestimate the sense of threat that accompanies evaluation. Even 
people who feel largely positive about it will often be worried or irritated by it; other 
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people may just pay lip service to enlightened attitudes about evaluation, but ultimately 
be defensive and obstructive. Some of the most important concerns are as follows: 

• An oppressive sense of being continually scrutinized, that can feel like Orwell’s 
“Big Brother is watching you.” 

• Resentment at having to take the time to provide the data for evaluation, since it 
leaves less time for client contact. 

• Fear that the results of evaluation may provide ammunition for managers or other 
colleagues to attack the quality or quantity of work being done. 

• Annoyance that the criteria used in evaluations do not capture the important 
aspects of a service’s work. Evaluations may just focus on quantitative measures 
that are easy to collect, such as numbers of clients seen, rather than more valid but 
less tangible indicators of quality. 

These are important objections. Even if you do not feel them strongly yourself, they 
will be undoubtedly felt, if not voiced, by a significant proportion of your colleagues. 
As we discussed in Chapter 3, this is an area where clinicians can use their skills to 
understand and possibly reduce the sense of threat. Clinical psychologists typically 
have had better training in this than other professionals involved in evaluation. 

Internal versus External Evaluation 

We have been tacitly assuming that you are evaluating a service that you yourself 
partly deliver: this is often called an in-house evaluation. Alternative possibilities are 
that an external evaluation consultant be used, or that an externally conducted 
inspection may be mandated by regulatory bodies. External consultants are usually 
less emotionally attached to the service and more able to weigh it dispassionately. On 
the other hand, external evaluators are usually more threatening, less knowledgeable 
about the service, and more expensive. For the rest of the chapter, we will assume that 
you are conducting an in-house evaluation, since that is the more common situation. 
However, psychologists are sometimes employed as external consultants to evaluate 
other services. External evaluations cover the same ground as in-house evaluations, 
but in addition they require the evaluator to possess specialized consultancy skills. 

Our own view is that, despite its potential difficulties, evaluation of the services 
they deliver needs to become a routine component of psychologists’ work, and that 
evaluation can be made more relevant if conducted by the psychologists themselves. 
The mental health field, in particular, is awash with poorly monitored programs and 
interventions. No one knows what their effects are, and there is often at best a lack 
of interest in, and at worst a contempt for, the views of the clients. Furthermore, the 
climate of managed care in the United States and clinical governance in the United 
Kingdom emphasizes evaluation, audit, and quality assurance (Cape & Barlcham, 
2002; Crombie et al., 1993; Lyons, Howard, O’Mahoney, & Lish, 1997; McSherry & 
Pearce, 2010). So the issue is not whether to evaluate, but how to. We believe that 
it is better to take control of evaluation yourself, than to have it imposed upon you. 

What Stakeholders Want from Evaluation 

The various stakeholders (evaluation jargon for someone who has an interest) in the 
service will each have different reasons for wanting the evaluation done (Rossi et al., 
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2004). These reasons are not necessarily mutually incompatible, but each stakeholder 
will attach their own weighting to each one. For example: 

• People funding a service (e.g., managers or grant-giving bodies) may want to 
know whether it is doing what it is supposed to be doing, and whether it is using 
its resources effectively. 

• Clinicians may want to test the effectiveness of an intervention or to compare it to 
other interventions. They may also want to know if their professional time is being 
used efficiently. 

• Service planners may want to justify the development or continuation of a service, 
or to improve its delivery. 

• Service users, or their parents, guardians, or carers, may be concerned about the 
accessibility, convenience, and effectiveness of the service. 

• Community leaders may want to know if the service is reaching its intended target 
population. 

Aside from these overtly expressed, rational reasons, there may also be some less 
legitimate, covert reasons for evaluating (Weiss, 1972). For example, evaluation may 
be used to delay making a decision, or as an empty public relations exercise, or as a 
way of generating information that can be used to justify closing down an awkward 
service. Evaluation is a complex political arena, in which some people do nasty things 
for nasty reasons, but rarely admit that they are doing so. 

The next section examines the preparatory thinking that is needed to set up an 
evaluation. Then we will look at ways of monitoring the process of service delivery, 
and finally touch on evaluation of impact and effectiveness. 

PREPARATION FOR EVALUATING A SERVICE 

As we stated at the beginning of the chapter, the first question to address in evaluating 
a service is “What is the service trying to do?” This is usually followed by the subsidiary 
question “Why is the service trying to do that?” Before the evaluation proper can 
proceed, these preparatory questions must be addressed. We have adapted the compre¬ 
hensive framework that Rossi et al. (2004) set out in their influential evaluation 
textbook. In practice, however, it is unrealistic to contemplate a full evaluation of a 
service: Their framework can be adapted to suit local needs. The process of answering 
these two questions can be broken down into the six steps shown in the box below. 


Six preparatory steps for evaluating a service: 

1. setting down the aims and objectives; 

2. specifying the impact model; 

3. specifying the target population; 

4. estimating the extent of the target problem in the target population; 

5. assessing the need for the service; and 

6. specifying the delivery system design. 
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These tasks are all easier to do when you are setting up a new service, as building in 
evaluation is much easier at the planning stage when there is some flexibility. Moreover, 
addressing evaluation issues at the outset can help to define the service’s goals and 
procedures. Specifying how a new service will be evaluated usually helps clarify what 
it is trying to achieve, and vice versa. However, these preparatory steps are also useful 
if you are evaluating an existing service. 


Aims and Objectives 

Aims and objectives are the sine qua non of evaluation, especially for new services. 
They articulate what the service is for. Without knowing what the service is trying to 
do, the evaluator has no benchmarks against which to measure its operation. People 
often confuse aims and objectives, or speak about them as though they are the same. 
However, there is a useful distinction to be made between the two terms. 

Aims are global statements of the desired outcomes of the service, expressed in a 
general, often rather idealized way. For example: “The service aims to reduce depression 
in mothers of young children.” Objectives are specific goals, ideally occurring within a 
specific period of time, that detail what the service is actually going to do to achieve its 
aims and that give specific targets to indicate whether or not the aims have been met. 
The objectives should be clear, simple, and, if possible, measurable, so that there will be 
no ambiguity about whether each one has been reached. Sometimes the acronym 
SMART is used, standing for Specific, Measurable, Achievable, Relevant, and Time- 
related. For example, “The service plans to set up three post-natal depression support 
groups for mothers of children under 2 years of age in the London Borough of Camden 
by the end of the current financial year.” 


Aims and objectives: 

Aims are global statements of desired outcomes. 

Objectives are specific goals, which: 

• ideally occur within a specific time period; 

• detail what the service will do to achieve its aims; 

• indicate whether or not the aims will have been met; and 

• are clear, simple, and, if possible, measurable. 


The exercise of specifying aims and objectives often helps to clarify the goals of 
a service. Carrying it out within a clinical team usually results in the team mem¬ 
bers having a better understanding of each other’s values and assumptions. 
Furthermore, without aims and objectives, team members may not know what 
they are supposed to be doing or may even be pulling in different directions or 
undermining each other. For example, in a community alcohol service, some 
members may emphasize prevention, others counseling, some individual work, 
others work with couples or groups, yet others research. While there is clearly 
healthiness in this diversity, the team also needs a sense of direction so that its 
energies are not spread too thinly. 
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The Impact Model 

The impact model specifies the theoretical or empirical basis for each of the activities 
that the service is undertaking. It may never be formally specified, but thinking about 
each of its three components helps team members to plan an effective service. These 
components are: 

• The causal hypothesis, which describes what causes or maintains the target 
problem(s) that the service is seeking to modify. 

• The intervention hypothesis, which specifies how the proposed intervention will 
affect that causal determinants. 

• The action hypothesis, which asserts that the proposed intervention will in fact 
reduce the target problem(s). 

For instance, in our maternal depression example, the causal hypothesis is that 
depression in mothers of young children is partly caused by a lack of social support; 
the intervention hypothesis is that a support group will increase social support; and 
the action hypothesis is that the support group will decrease maternal depression. 
These three parts of the impact model are depicted in Figure 11.2. 

Sometimes, however, it may not be possible or necessary to address the cause of 
the problem directly. For instance, with adult survivors of child sexual abuse we 
cannot alter the cause, because it occurred years ago. Furthermore, addressing the 
cause may not be the best strategy for alleviating the target problems: etiology 
does not necessarily determine treatment. The point of specifying the impact 
model is simply to make the rationale for the service’s actions as explicit as 
possible. 


The Target Population 

Having specified the impact model, the next step is to identify the targets, direct 
and indirect, for the intervention. Direct targets are those people on whom the 
intervention is specifically focused, for instance, mothers of children under 2. It is 
important to define the unit of analysis, which could be individuals, families, or 
groups. Indirect targets are those people who may benefit indirectly from the 
service, for instance, the families of the above women. Including the indirect 
targets gives a full picture of the impact of the service. Ideally, the targets should 
be specified in the aims and objectives of the service. 


No social Causal hypothesis 

support -► Depression 


Intervention 

hypothesis 


Support 

groups 



Figure 11.2 The impact model 
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Target boundaries should be clearly defined using both inclusion and exclusion criteria, 
for example, a specified geographical catchment area, and demographic and clinical char¬ 
acteristics of the client group. It is important to strike a balance between criteria that are 
overinclusive and those that are too restrictive. The following model, taken from a local 
drop-in service for people with severe mental health problems, is a good example of a 
target population description with both inclusion and exclusion criteria: 

To be in the Target Group, a person has to be: aged over 16 years; be living, staying 
or sleeping out in the South Camden sector of Bloomsbury Health District; have 
severe and enduring mental health problems; have positive or negative symptoms of 
psychotic illness; have had previous contact with mental health services; not be 
actively involved with other services; be experiencing severe social problems. People 
who meet these criteria but whose primary problem is due to the abuse of alcohol or 
drugs do not come into the Target Group. (Compass Project, 1989) 


Estimating the Extent of the Target Problem in the Target Population 

When planning a service, it is naturally important to estimate the extent of the target 
problem in the target population. Three epidemiological concepts are useful here. 
Incidence is the number of new cases during a specified time period, for example, the 
one-year incidence of flu. Prevalence is the number of existing cases, either at a speci¬ 
fied time (“point prevalence”), or during a time interval. For example, the National 
Comorbidity Study Replication (Kessler, Chiu, Demler, & Walters, 2005) gives the 
one-year prevalence rates of psychological disorders in the United States. 

Incidence and prevalence are related to each other by the duration of the illness: 
higher incidence or a longer duration will both increase the prevalence. Incidence is 
a more useful measure for illnesses of short duration such as flu; prevalence is more 
useful for those of longer duration such as Alzheimer’s disease. With psychological 
problems, it is not always clear whether to measure the extent of the target problem 
in terms of incidence or prevalence. For example, in providing services for dealing 
with cases of child abuse, do you want to measure the number of new cases per 
month (incidence), or the total number of cases on the social services list (preva¬ 
lence)? The issue is whether you are concerned with detecting and treating new 
cases as they appear or with knowing the number of existing cases in a population, 
whatever the time of origin. 

The third concept, population at risk , is the subset of the general population that is 
more at risk of contracting a disease: intravenous drug users, for example, are a 
population at risk for HIV infection. It is particularly helpful to consider this target 
group for preventive projects. 

There are several methods for estimating the extent of the target problem. There is 
a trade-off between their validity on the one hand and their complexity and cost on 
the other. 

Surveys and censuses can be done in order to get the respondents’ direct estimates of 
the size and severity of a problem. They generally yield the most valid data, especially 
if they include structured interview measures, but they are time-consuming and 
expensive to carry out. Two large national surveys of the prevalence of psychological 
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disorders provide useful comparative data: the US National Comorbidity Study 
Replication (Kessler et al., 2005), and the UKNational Survey of Psychiatric Morbidity 
(Bebbington et al., 2000a). 

Rates under treatment. The size of the target problem in the target population can 
sometimes be estimated by looking at the rates under treatment in similar commu¬ 
nities (if they exist). The number of people who seek treatment is usually a small 
fraction of the actual number of cases, but there may be ways of estimating the size of 
the untreated population, based on previous studies. For example, in the US National 
Comorbidity Study, only about a quarter of people suffering from a psychological 
disorder had received help from a mental health specialty service in the last year (Wang 
et al., 2005), and in the UKNational Survey of Psychiatric Morbidity, fewer than 14% 
of people with a neurotic disorder were currently receiving any formal treatment 
(Bebbington et al., 2000a). 

Indicators. This method uses statistical techniques, such as multiple regression, to 
predict the size of the target problem from nonclinical criteria. For example, one 
indicator of the number of heroin addicts in a community is the number of arrests for 
sale or possession of the drug (Hartnoll, Daviaud, Lewis, & Mitcheson, 1985). 

Key informants. The researcher can use “networking” or “snowballing” sampling 
methods (see Chapter 10) to find knowledgeable people who might be able to help 
estimate the extent of the target problem. This is a simple and inexpensive method. In 
our experience, 20 or 30 respondents are usually sufficient. The advantage is that it 
develops the support of influential workers in the community; the drawback is the 
possible bias of the individuals surveyed. Qualitative and/or quantitative interviewing 
methods can be used. 


Needs Assessment 

Assessing the extent of the target problem in the target population is the first step in 
planning a service, as it gives an indication of what the volume of demand is likely to 
be. However, it is easy to assume that everyone suffering from the target problem 
needs or desires the service, which is not necessarily true. Needs assessments collect 
data that are more relevant to the service’s operation: they are the health-care 
equivalent of market research. 

The concept of need is often used in a technical sense, defined as a problem for 
which there is a potentially effective intervention (McKillip, 1987; Stevens & 
Gabbay, 1991; Thornicroft, 2001). Under this somewhat counterintuitive defini¬ 
tion, need is assessed by professionals, rather than by the users themselves. It is not 
determined by the severity of the problem, but by whether something effective can 
be done about it. In contrast, demand is defined as what people ask for and supply 
as what is provided. 

Stevens and Gabbay (1991), in an article nicely entitled “Needs assessment needs 
assessment,” discuss the relationship between need, demand, and supply. They depict 
the relationship of the three concepts using a Venn diagram (see Figure 11.3). The 
diagram helps conceptualize and label the areas where the concepts overlap, for 
example, need that is supplied (areas 6 and 7) is called “met need” and need that is 
not supplied (areas 1 and 4) is called “unmet need.” 
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Cultural 
and ethical 
determinants 



Need What people benefit from 

Demand What people ask for 

Supply What is provided 


* The external field where a potential service is 
not needed, demanded or supplied 

Figure 11.3 Need, demand, and supply: influences and overlaps. © Crown copyright. 
Reproduced from Stevens & Gabbay (1991) with the permission of the Controller of Her 
Majesty’s Stationery Office. 


Needs and demands can be assessed using the methods described above for assess¬ 
ing the extent of the target problem. However, such studies are not always popular 
with health-service managers and politicians, as they often imply spending further 
resources to satisfy whatever unmet needs are identified. 


Delivery System Design 

With new services, the foregoing are the preliminary steps in establishing the 
likely need and demand. The final step is to design the service itself. The delivery 
system design, which is ideally set out in the form of an operational policy docu¬ 
ment , specifies how the clinical team will go about delivering the service. It 
includes the organizational arrangements, such as procedures and activities, and 
structural aspects such as the physical setting, staff, and materials that are required 
to provide the service. The discussion needed to produce an operational policy 
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document, and the existence of the document itself, may help anticipate some 
common problems in newly established clinical teams. 

Once it has decided on its operational policy, the team can use role-plays or 
simulations to try to predict whether things will work smoothly in practice (e.g., what 
exacdy will happen when a client walks in the door or when someone makes a 
telephone referral). Larger scale services may use operations research methods (a set of 
scientifically based procedures to aid decision making) to see if the services are planned 
in an optimal way, for example, whether the staffing levels at different sites are 
appropriate for the anticipated workloads (Taha, 2010). 


MONITORING THE PROCESS OF SERVICE DELIVERY 


Monitoring the process of service delivery: 

• The central question is “Who does what to whom?” 

• Monitoring service delivery can be divided into monitoring coverage and 
implementation. 

• Monitoring coverage asks the question “Who is the service reaching?” 

• A service is biased if it favors certain subgroups of its target population at the 
expense of others. 

• Coverage can be assessed from service records, by surveys, and by exam¬ 
ining dropouts. 

• Monitoring implementation asks “What service is being given?” 

• Implementation can be assessed by observation, service records or data¬ 
bases, and surveys. 


Having gone through the above preparatory steps, the evaluation now focuses on 
what kind of service is being delivered: the process of the service, in Donabedian’s 
(1980) terminology. Monitoring the process of service delivery means asking “Who 
does what to whom?” It also addresses such questions as “Is this service being deliv¬ 
ered in the best possible way?” and “Is it accessible to its consumers?” (Maxwell, 
1984, 1992). This differs from outcome evaluation, covered below, which assesses 
whether users benefit from the service. 

There are two main targets of monitoring delivery, coverage and implementation 
(Rossi et al., 2004). Monitoring coverage asks the question “Who is the service 
reaching?”, whereas monitoring implementation asks “What service is being given?” 
In addition, there is financial monitoring (to make sure that the funds are being 
properly used) and legal monitoring, to make sure that the service operates within 
the relevant laws (e.g., equal opportunities, health and safety). These latter areas are 
specialized activities, being the province of accountants and lawyers respectively, and 
we will not cover them here. 
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Coverage and Bias 

Coverage is defined as the extent to which the service reaches its intended target 
population: is it reaching everyone it is supposed to, or just a certain subgroup of the 
target population, or even mainly people outside the target population? The related 
concept of bias is defined as the extent to which subgroups of the target population 
participate differentially, that is, the degree to which some subgroups receive greater 
coverage than others. Bias may arise from several factors: 

• Self-selection, for example, if only the more motivated people come to a drop-in 
service. 

• Program actions, for example, if staff favor some service users at the expense of 
others. In particular, there may be creaming-, that is, a bias towards the more 
advantaged subgroups of the target population. For instance, when Community 
Mental Health Centers were first set up in the United States in the 1960s and 
1970s, they tended to see a large proportion of better functioning people who 
were easier to work with, neglecting older people and people with severe and 
enduring difficulties (Grob, 2011). Other examples of program bias are where 
services do not adequately cater for the needs of physically disabled users or of 
users from certain ethnic groups (possibly because of unconscious racism). 

• Unforeseen influences, such as where the service is located, for example, if it is poorly 
served by public transportation (stricdy speaking, this is an aspect of structure rather 
than process). These factors may again reflect unconscious program bias. 

Undercoverage occurs when some people in the target population have unmet 
needs. This is often a problem in face-to-face psychology services, as there are usu¬ 
ally many people in the community who need the service but do not get it 
(Bebbington et al., 2000b; Wang et al., 2005). Overcoverage occurs when some 
inappropriate targets are served. For example, in health promotion campaigns, for 
example, to reduce smoking or to promote safer sex, material may inevitably be 
directed at some people outside the target population. This is usually not a 
significant problem. 


Assessing Coverage 

Several methods can be used to assess coverage: 

• Service records are the most obvious and commonly used method. Most psy¬ 
chology services keep records of basic client information. They can be analyzed 
according to demographic characteristics, for example, client gender, age, or eth¬ 
nicity, and possibly also according to clinical characteristics such as presenting 
problem or referral source. 

• Surveys can be used when services are not targeted at defined groups of individ¬ 
uals, but at an entire community. They are more appropriate for preventive, health 
education or health promotion services. For instance, Barker, Pistrang, Davies, 
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Shapiro, and Shaw (1993) assessed the coverage of a BBC television series on pre¬ 
ventive mental health. Although it was viewed by a national audience, the series 
was primarily aimed at certain subgroups of the population, that is, those people 
who were experiencing psychological problems themselves or who had a friend or 
relative who was. A national survey was used to estimate the nature of the viewing 
audience and their reactions to the series. 

• Analysis of dropouts can be used to assess bias, by comparing people who participate 
fully in a service with those who drop out before the end. A high dropout rate 
clearly indicates that something is wrong with the service. It may reflect client 
dissatisfaction, or conditions in the community that prevent full participation 
(e.g., stigmatization of service users). Data on dropouts can come from service 
records or from surveys designed to find nonparticipants. Such data help identify 
subgroups of the target population who are not receiving the service. It may be 
possible to ask nonusers about their perceptions of the service and then to use 
their opinions to redesign the intervention to be more suited to their needs (an 
example of the formative use of evaluation). 


Service Implementation 

Monitoring service coverage focuses on whom the service is reaching; monitoring 
service implementation or delivery focuses on what kind of service the users are get¬ 
ting. Is the service’s delivery consistent with its design specifications, that is, is it deliv¬ 
ering what it is supposed to be delivering, according to its aims and objectives? You 
can look at both descriptive aspects, to label what components of the service are 
given, and quality aspects, to describe how well they are given. Implementation can 
be assessed by: 

• Observation (qualitative or quantitative) in the clinical setting. 

• Service records , for example, in antenatal care, to ensure that the right number of 
visits was made and the correct things done at each one. Standard clinical records 
can be augmented by asking clinicians to complete a checklist of activities. They 
can be given a standard form to tick off each procedure as it is completed, for 
example, in HIV pre- and post-test counseling, or an audit team can review the 
case note files at regular intervals to make sure that they are complete and that 
proper procedures are being followed. 

• Management information systems and computerized case registers can keep track 
of the type of service each client received at each visit. 

• Service-user surveys may be desirable when it is not possible to obtain user data 
routinely as part of service activities, or when the size of the target group is 
large and it is more efficient to do a sample survey than to obtain data on all 
participants. You can ask the clients about what kind of service they actually 
received. A natural step if you are doing this is also to ask them about their 
satisfaction with the service and what its impact was, which leads into the final 
area, outcome evaluation. 
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OUTCOME EVALUATION 


• Outcome evaluation examines the impact of the service. 

• One key area of impact is the extent of clinical benefit. 

• Necessity often dictates using a simple one-group pretest-posttest design. 

• Outcome is assessed using measures linked to the objectives of the service. 

• Studies often include measures of user satisfaction. 

• It is important to address economic variables in addition to psychological ones. 


Outcome evaluation examines the impact of the service. It asks the crucial question 
“Do users benefit from this service?” Benefits may be manifest in the form of an 
improvement in the target problem (sometimes known as “health gain”) or in the 
form of changes in attitude about the problem, so that the client, or other stake¬ 
holders such as parents or carers, experience it as less problematic. 

It is also important to bear in mind the possibility of negative outcomes. Many 
programs, especially in the political arena, are subject to the law of unintended 
effects. In other words, policy changes that are intended to make matters better 
actually end up by making them worse. Or they make improvements in one area at 
the expense of deterioration in another. An example within clinical psychology is 
“psychological debriefing” for victims of traumatic incidents. Such programs were 
established with the plausible rationale that having counseling immediately after a 
trauma might prevent the development of post-traumatic stress disorder. However, 
research has shown that, far from helping many victims, psychological debriefing 
may often make them worse off than if they had had no intervention at all (Mayou, 
Ehlers, & Hobbs, 2000). 

Assessing outcome involves applying the research methods that we have dis¬ 
cussed in previous chapters, in so far as it can be done within the constraints of the 
clinical setting. The first step is to choose outcome measures that capture the key 
objectives of the intervention, for example, an intervention aimed at helping 
anxious adults might use the Generalized Anxiety Disorder scale (Kroenke et al., 
2007). The second step is to select a research design that will assess any changes in 
those measures and, if possible, enable such changes to be attributed to the inter¬ 
vention itself rather than to other variables (Cook & Campbell, 1979; see also 
Chapter 8). Of course, in many working services, this is a counsel of perfection, 
and the evaluator may have to be content with drawing inferences from less than 
adequate designs or measures. 

Recall the efficacy versus effectiveness research distinction, which we examined in 
Chapter 8. (Efficacy research uses randomized designs, often in highly controlled 
research clinic settings; effectiveness research uses nonrandomized designs to study 
interventions as they actually happen in real-world settings.) Evaluation research is 
by definition effectiveness research, and will often use a simple design, such as the 
one-group pretest-posttest design, although this may be augmented using bench¬ 
marking (Leach & Lutz, 2010), that is, comparing outcomes with those from other 
similar services. 
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Naturalistic field research of this sort is clearly imperfect from the point of view of 
internal validity. The issue here, however, is whether it is good enough to draw plau¬ 
sible conclusions that can aid practical decisions (Seligman, 1995). Managers and 
policy makers are often more convinced by research conducted in their own service, 
even if it is scientifically flawed, than by a methodologically sound piece of research 
published in a reputable scientific journal, which was conducted in another setting by 
other investigators. 


Client Satisfaction Surveys 

One important area of study is client satisfaction research (Lebow, 1982). Before any 
more sophisticated evaluation designs or outcome measures are deployed, it is impor¬ 
tant to ascertain that clients find the service acceptable and beneficial. If not, they will 
not attend and the service will not receive funding. Clients’ views of the service they 
have received are usually assessed via standardized self-report instruments such as the 
Client Satisfaction Questionnaire (CSQ: Larsen et al., 1979), which can be adapted 
to most clinical services. 

Because client satisfaction surveys rely on retrospective self-report methods, their 
validity is open to criticism (Shadish et al., 2002; Seligman, 1995; see also Chapters 6 
and 8). Some mental health professionals can be dismissive of clients’ views: “patients 
say any treatment is helpful, even useless treatments like aromatherapy, so what is the 
point of asking for their perspective.” However, it seems wrong to use these validity 
problems to dismiss the whole enterprise out of hand (Lebow, 1982). Professionals’ 
views of the effectiveness of the services they deliver also suffer from validity prob¬ 
lems; service evaluation ideally needs to take both perspectives into account, and ide¬ 
ally that of third parties (e.g., family members) as well. Positive response sets in clients’ 
reports can be avoided to some extent by asking clients explicitly to list any problems 
with or complaints about the service (Parry, 1992), and validity threats can be taken 
into account when interpreting the findings. 

One published example of client satisfaction research is the Consumer Reports study 
(Seligman, 1995), which we discussed in Chapter 8. It used a large, though unrepre¬ 
sentative, sample to gather post hoc consumer views from people who had had 
psychological therapy, looking, for example, at their satisfaction with different 
therapeutic modalities and orientations. Seligman (1995), while acknowledging the 
problems with this approach, highlighted its usefulness in terms of examining how 
people experience therapy as it is actually conducted in the real world. 


Patient-focused Research and Outcomes Management 

One distinctive approach to evaluation research in clinical services is ease tracking or 
patient-focused research (Barkham et al., 2010; Lambert, 2001). This sets out to eval¬ 
uate the outcome of individual clients, in contrast to the groupings of clients that are 
the usual focus of evaluation research. The basic procedure is to compare each client’s 
progress throughout the therapy against the trajectory that would be expected, given 
that client’s initial clinical status. Such trajectories can be established via normative 
research, often involving thousands of clients. If the client’s progress departs from the 
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trajectory, that can be noted and acted upon. If normative data is unavailable, in prac¬ 
tice it is sufficient simply to note instances in which clients show reliable improvement 
or deterioration on the particular outcome measure (see Chapter 12). It is also pos¬ 
sible to use idiographic (individualized) measures as well as nomothetic ones (Sales & 
Alves, 2012). 

Feeding back to clinicians information on each client’s outcome as the therapy 
progresses, particularly if the client is doing less well than would be expected, has 
been shown to enhance the final clinical outcome (e.g., Shimolcawa et al., 2010). 
Furthermore, this information on the progress of individual clients can be used to 
initiate improvements in the service delivery system, an approach known as outcomes 
management (e.g., Lyons et al., 1997). 


Cost-effectiveness 

A final issue to consider is cost-benefit or cost-effectiveness evaluation (Krupnik & 
Pincus, 1992; Mangen, 1988; Rossi et al., 2004). This compares the service’s costs 
with its outcomes, in order to ensure that its funds are being well used. In economic 
terms, it compares inputs to outputs. This kind of evaluation has become more 
prominent in both the United States and the United Kingdom, as purchasers of 
health-care services (in the United States, health maintenance organizations or 
insurance companies; in the United Kingdom, clinical commissioning groups) must 
decide what to spend their limited resources on. Their decisions will be based on 
which services they think will give the greatest outcome per unit of resource 
employed. 

There are clearly problems in measuring both input and output. At the input end, 
costing must take into account both direct costs, principally psychologists’ contact 
time, and overheads, such as the cost of buildings, equipment, and support staff 
(Cape, Pilling, & Barker, 1993). 

The output end of the calculation is even more problematic, since there is no uni¬ 
versally agreed upon measure of effectiveness or of benefit. Different healthcare ser¬ 
vices (e.g., heart surgery compared to psychiatric in-patient treatment) use different 
criteria to measure outcome. One possible solution, derived from health economics, 
is to combine quality of life and life expectancy into quality adjusted life years, or 
“QALYs.” Thus an outcome of a treatment, say an operation for cancer, may give a 
person a high quality of life for a short time or a medium quality of life for a longer 
time. These outcomes would be considered equivalent in terms of QALYs. Such an 
approach, although it fulfills the economists’ goal of giving a single index upon which 
to base resource allocation, clearly makes a number of problematic assumptions about 
how to weight quite different clinical outcomes (Cox et al., 1992) and is also difficult 
to apply to psychological interventions. 

Another approach is to attempt to measure the economic burden of illness or 
psychological disorder in terms of lost productivity, increased social services expendi¬ 
ture, and increased use of medical services (e.g., general practitioner consultations, 
emergency room visits, or in-patient hospitalization). Then the outcome of a 
psychological intervention can be partly assessed by the savings made in terms of 
increased productivity and reduced social services and healthcare expenditure—often 
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referred to as “cost offset”—which may represent a substantial financial return in rela¬ 
tion to the expenditure on the psychological intervention (Krupnik & Pincus, 1992; 
Layard, Clark, Knapp, & Mayraz, 2007). 

An example of a cost-offset study is Humphreys and Moos’s (2001) analysis of the 
value of encouraging patients with substance abuse to participate in self-help groups 
as an adjunct to their treatment. Using a large all-male sample drawn from inpatients 
in U.S. Veterans’ Administration hospitals, the study compared inpatients in sub¬ 
stance abuse programs that emphasized self-help groups with patients in programs 
that emphasized a traditional cognitive-behavioral approach, in terms of their 
subsequent health-care costs in the year following discharge. Humphreys and Moos 
found that the patients who were in the cognitive-behaviorally oriented programs 
received post-discharge health care costing an average of $12,100 per year, whereas 
patients in the self-help oriented programs received post-discharge care costing an 
average of $7,400 per year, thus demonstrating a significant cost offset for those pro¬ 
grams that featured self-help groups as part of the therapy. 

A simple form of cost-effectiveness analysis, and one with direct relevance to 
practitioners, is to compare practitioner input, measured in terms of number of 
sessions, with client output in terms of clinical improvement. All therapists must 
ask themselves, implicitly or explicitly, whether it is better to give one client 20 
sessions or two clients 10 sessions (or 10 clients two sessions). Cost-effectiveness 
evaluation attempts to make the basis of such decisions explicit. Howard, Kopta, 
Krause, and Orlinslcy’s (1986) analysis of dose-response relationships in psycho¬ 
therapy falls under this heading. They used the statistical technique of probit 
analysis on a data set drawn from 15 published studies to estimate the improve¬ 
ment rate of clients after a given number of sessions. They estimated, for example, 
that 53% of clients had improved by eight sessions and that 74% of clients had 
improved by 26 sessions. However, to be a true cost-effectiveness analysis, the 
input must then be expressed in monetary terms: that it costs so many dollars to 
produce such and such an outcome. 


CHAPTER SUMMARY 

Evaluation is applied research that aims to assess the worth of a specific service. It 
includes the areas of clinical audit, quality assurance, needs assessment, and outcomes 
evaluation, although each of these areas has its own distinct literature. Evaluation 
studies are conducted for the benefit of the various stakeholders in the service, 
although different stakeholders often have different priorities for the evaluation. 
Organizational and political issues are crucial, both in deciding the priorities of the 
evaluation, and in addressing the sense of threat that often accompanies it. To start 
with, the evaluator asks two central questions: “What is the service trying to do?” and 
“How will you know if it has done it?” Answering these questions is simpler if the 
service has explicit, agreed-upon aims and objectives, and a well-thought-out ratio¬ 
nale for why it is doing what it does. 

Evaluations can address the process or the outcome of a service. Process evaluation 
examines who is coming to the service and what services they are being given. It is 
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important to ensure that the service delivery does not unfairly favor certain subgroups 
of the population at the expense of others. Outcome evaluation examines the impact 
of the service—whether users benefit or not. In addition to examining clinical out¬ 
come, evaluators may look at client satisfaction, and at economic indicators of costs 
and benefits. 


FURTHER READING 

Rossi et al.’s (2004) text, which we have drawn on extensively here, gives a compre¬ 
hensive framework for conducting a program evaluation. Shadish et al. (1991) pre¬ 
sent the conceptual background, and discuss the ideas of the major figures in the 
modern American program evaluation movement. Weiss (1972), in one of the 
founding texts of this movement, gives an excellent discussion of the rational and 
irrational feelings about evaluation. Patton’s (2008) Utilization-Focused Evaluations^ 
also appealing for its broad, practical approach. Seligman’s (1995) Consumer Reports 
study and the subsequent commentary on it (in the October 1996 special issue of 
American Psychologist) give an airing to many of the issues raised by using imperfect 
research designs to answer practical questions. The quality assurance and audit litera¬ 
ture as applied to clinical psychology are reviewed by Cape and Barkham (2002). 


QUESTIONS FOR REFLECTION 

1. Apply program evaluation concepts to an intervention related to your research 
interests. For example, what would your research look like if it were to be imple¬ 
mented in real-world, community agency settings? What would the aims and 
objectives of the intervention be? What is the target population? How might you 
assess the level of need? 

2. Individual case-tracking (i.e., patient-focused) methods are controversial, for 
example, Lambert’s use of early outcome data to designate an individual client’s 
treatment as “Off track,” “On track,” and “Cured” or “Nonclinical.” Some have 
criticized these methods as an unwarranted intrusion into psychotherapy and 
have argued that insurance companies or mental health services will inevitably 
misuse these methods to deny services to clients. What do you think? 

3. Cost-effectiveness and cost-benefit methods have also been controversial. Do 
you think that it is possible to properly attach monetary values to psychological 
distress costs or treatment benefits? Assuming that we could do it, should we? 
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KEY POINTS IN THIS CHAPTER 

The final stage of the research process involves making sense of your data, first 

for yourself, then for a wider audience. 

• Analysis means establishing what the findings are and how they answer the 
research questions. 

• Qualitative analysis is an inductive process of developing a set of themes or 
a conceptual framework that captures the main ideas in the data. 

• Quantitative analysis can be exploratory or confirmatory, depending on the 
research questions. 

• The concepts of statistical conclusion validity, effect size, and clinical significance 
are used to evaluate the strength and meaningfulness of quantitative findings. 

• Interpretation involves understanding the psychological meaning of the 
results, and their scientific and practical implications. 

• Dissemination means communicating both the findings and your understanding 
of them to other people. 


Having collected the data, the final stage of the research process consists of making 
sense of the findings, first for yourself, then for a wider audience. This stage can itself be 
broken down into three parts: analysis, interpretation, and dissemination. Analysis 
means establishing what the findings are and how they answer the research questions, 
interpretation means understanding the findings in terms of their broader implications, 
and dissemination means communicating them to a wider audience. Analysis is typically 
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reported in the Results section of a research paper; interpretation is reported in the 
Discussion section. As always, the components overlap and intermingle: an interpreta¬ 
tion of the findings might suggest a further analysis of the data, or presenting the study 
at a conference might lead to new ideas about its interpretation. However, tor simplicity, 
we will cover the three components as though they were distinct and sequential. 

The goal of the analysis is simply stated: to use the data to answer the research ques¬ 
tions. We will look at the steps undertaken in the qualitative and quantitative approaches. 
However, the specific techniques of data analysis, both qualitative and quantitative, 
involve specialized methods that are beyond our present scope. Here we will focus on 
the general strategies that researchers use when analyzing their data. 


QUALITATIVE DATA ANALYSIS 


• Qualitative analysis can take either a within-case (idiographic) or a cross-case 
approach. 

• The first step is usually to transcribe the data (paying attention to confiden¬ 
tiality issues). 

• It is important to become familiar with the data, by listening to recordings 
and rereading transcripts. 

• Qualitative analysis is an inductive procedure, which typically involves three 
related processes: identifying meaning, categorizing, and integrating. 


As described in Chapters 6 and 7, qualitative data come in various forms, such as tran¬ 
scripts from interviews, field notes from observations, or other kinds of texts. It is usually 
unstructured and often voluminous. Qualitative researchers are faced with finding ways 
of systematically analyzing these collections of words. This problem is compounded by 
the well-known issue of “qualitative overload”: the fact that qualitative investigations, 
even ones with a small sample size, usually generate vast quantities of data. This abun¬ 
dance of data needs to be analyzed and represented to the reader clearly and accurately. 

Much has been written about qualitative data analysis. Key references include Braun 
and Clarke (2006,2013) on thematic analysis, Corbin and Strauss (2015) on grounded 
theory, Potter and Wetherell (1987) and Potter (2012) on discourse analysis, and 
Smith et al., (2009) on interpretative phenomenological analysis, as well as some more 
general approaches (e.g., Creswell, 2013; Miles et al., 2014; Patton, 2002; Pope et al. 
2000). There are also several computer programs now available to assist with qualitative 
analysis (e.g., Atlas-ti and nVivo), and the online app Dedoose (http://www.dedoose. 
com/). Some of these tend to favor a particular analytic approach (e.g., nVivo is based 
on grounded theory analysis), others are more flexible. 

Here we will attempt to sketch out some general principles in analyzing qualitative data. 
However, it is worth noting that different qualitative orientations use different methods 
of analysis, and also tend to use different vocabulary to describe similar procedures. 

Different orientations, and the goals of particular studies, also vary in the depth of 
interpretation or inference. Some approaches keep to a fairly descriptive level, without 
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much interpretation, but most go beyond description: they attempt to explain the 
meanings, causes, structures, or patterns in the phenomenon being examined. The 
degree of interpretation partly depends on the goal of the particular investigation. 

Flexibility is required in all phases of qualitative research, including the analysis. It is 
important to adapt the analytic method to the data, to the research question, and to 
your own cognitive style and talents. However, it is also important to be explicit about 
your procedures. In other words, flexibility does not mean vagueness or sloppiness. 


Within-case and Cross-case Analysis 

Regardless of one’s general approach to qualitative research, the data may be analyzed 
either within or across cases. Although the vast majority of studies report cross-case 
analyses, they normally start out with analyzing each individual case separately. 

Within-case analysis is idiographic in nature, concentrating on understanding the 
features of a single case, or a small number of cases. The material may be presented 
with minimal interpretation, for example by organizing it chronologically into a nar¬ 
rative. In this descriptive, narrative approach, the researchers restrict themselves to 
arranging the material into a story, which is allowed to speak for itself. Such presenta¬ 
tions can be an excellent way of demonstrating the existence of a phenomenon. 
A classic example is Bogdan and Taylor’s (1976) study of “Ed Murphy,” which dem¬ 
onstrates the existence of perceptive self-awareness in a young man labeled as being 
“retarded” (in UK terminology, a person with intellectual disabilities). 

However, even when qualitative researchers focus on individual cases, they usually 
go beyond the descriptive level. This approach is consistent with clinical psychologists’ 
interest in understanding the meaning of individuals’ situations. For example, Varvin 
and Stiles (1999) studied a political refugee’s experience of therapy, using a particular 
theoretical framework (the “assimilation model”) to interpret the data. 

Cross-case analysis looks across individuals in order to identify common themes about 
the question being studied, aiming to see which aspects are shared across participants. 
Usually the researcher is also interested in describing variations within the phenomenon, 
that is, themes or patterns that characterize only some participants’ accounts. For 
example, Knox , Edwards, Hess, and Hill (2011) studied 12 trainee therapists’ percep¬ 
tions of self-disclosure by their supervisors. They identified several categories of the con¬ 
sequences of supervisor disclosure, both positive and negative. Some were mentioned by 
most supervisees (e.g., normalizing their experience) and some were less common (e.g., 
the supervisees themselves became more confident in disclosing in supervision). 


Preliminaries to Qualitative Data Analysis 

Data Preparation 

The first step in qualitative analysis is to prepare the data. In interview studies or 
studies using recorded interactions, this involves transcribing the recordings— 
usually a laborious and time-consuming process. 

Transcription is in fact a form of analysis (Riessman, 2008), because of the many 
theory-guided decisions that must be made along the way. For example, decisions 
must be made about how to break the speech or interaction into units, and how to 
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record nonverbal and paralinguistic elements of speech (e.g., speech rate, loudness, 
pausing). Even the seemingly straightforward task of writing down the spoken words 
turns out to be complicated by repetitions and irrelevancies, as well as our unconscious 
tendency to edit what we hear, ignoring things that don’t make sense and hearing 
what we expect to hear. Clearly, there is no single correct method of transcription; 
rather, the transcription method should be chosen for the task at hand. 

For transcribing a qualitative interview, where it is mostly the content, rather than 
how things are said, that is of primary interest, the best approach is the simple method 
illustrated in the sample interview presented in Chapter 6 (also see box below tor 
guidelines). However, for research requiring the careful analysis of moment-by¬ 
moment interaction, many researchers use Jefferson’s system, presented, for example, 
in ten Have (1999), which measures pauses and the exact beginning and end of 
interruptions, among other things (see also Mergenthaler & Stinson, 1992, for widely 
used standards for transcribing therapy sessions). 

Before beginning the analysis, it is vital to check the transcripts for accuracy, and 
also to ensure that participants’ anonymity is preserved by removing or disguising 
names and other identifying information. In research using multiple perspectives and 
sources of information (e.g., Comprehensive Process Analysis: Elliott et ah, 1994), 
the different types of data must also be collated. 


Guidelines for transcribing qualitative interviews 

• The principal requirements are accuracy and readability. The transcript 
should convey the words and quality of the actual speech: what was said and 
how it was said. 

• Prefer short sentences to long ones. Use commas to indicate subclauses 
(as consistent with normal punctuation). 

• Indicate special emphases by question marks, exclamation marks and italics. 

• Use quotation marks to indicate that the participant is reporting the speech 
of other people. 

• Use round parentheses to give paraverbal features of the interaction (laugh, 
sigh, etc) or to indicate inaudible material: (inaudible). 

• Use standard spellings as much as possible, even for nonstandard 
pronunciations. 

• Spellings of some common paraverbal utterances: mm, mm-hm, um, er. 

• Use square brackets to enclose names omitted to preserve confidentiality: 
she went to talk to [her sister]. 

• Also use square brackets to include back-channel overtalk that doesn’t interrupt 
the speaker’s flow: Participant: I want to go [Interviewer: Yes] back to work. 

• Indicate silences of 5 seconds or more in brackets: (10 sec silence). 

• If a speaker is interrupted, so that their flow of speech is cut off by the 
following speaker, use two slashes// 

Adapted from Auld & White (1956), Mergenthaler & Stinson (1992), Potter & Wetherell (1987), 

Stiles (1992). 
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Immersion 

Before the formal analysis, it is important to immerse yourself in the data, by listening 
to the recordings and by rereading the transcripts. This gives you an overall feel for 
the data’s scope and meanings, although you may not be able to articulate your 
understanding at this stage. Systematic understanding is a product of the formal 
analysis. 


Processes in Qualitative Data Analysis 

Qualitative data analysis (particularly thematic analysis approaches) can typically be 
thought of as involving three generic sets of processes, which we will call identifying 
meaning, categorizing, and integrating. Naturally, this division is a simplification: the 
processes represent related activities and the boundaries between them are not always 
distinct. Furthermore, the processes are not linear: researchers often cycle back and 
forth between them. Different approaches to qualitative analysis make use of different 
forms and mixes of these activities. However, they all describe an inductive process, 
which is common to most qualitative research, of going from the raw data to ideas 
about the data. 

It is usually unwise to begin by analyzing the transcript as a whole: global analysis 
encourages ignoring data that do not fit one’s expectations or emerging under¬ 
standing. For this reason, most approaches adopt some type of microanalysis, paying 
close attention to words and phrases. Some approaches divide the data into units 
before coding, the most common division being the meaning unit , which consists of 
material on a single point in the participant’s description, often approximating a 
verbal sentence. Its use is largely a practical strategy; its exact definition is not critical 
and may vary among researchers and among studies with different research 
objectives. 

Identifying Meaning 

The researcher begins the formal part of the analysis by going through the data and 
trying to identify the ideas that are being expressed. In this first stage, the researcher 
attaches tentative labels, often called initial codes, to the text. The codes are 
attempts to sum up the essence of what is being said in each phrase, sentence, or 
other portion of text. In the second stage, the codes will then be integrated into 
larger themes or categories, but the initial task is one of identifying meaning. 
However, texts can be understood in many ways, and different approaches to 
qualitative research (as described in Chapter 5) use different ways of highlighting 
the meaning of the data. 

Codes are not necessarily mutually exclusive; that is, a particular unit of data may 
be assigned to a number of different ones. For example, consider one statement from 
a peer supporter in the Pistrang et al. (2013) Women Helping Women study, which 
we used as an example in Chapter 6: “At the beginning, it was just a fine balancing act 
of making sure she knew that I really had gone through it [the illness] and that I was 
very willing to talk about anything she wanted to talk about. But, at the same time, 
that it was about her.” When first encountered early in the analysis, this statement was 
given two different codes: “Reciprocity: a two way process” and “Self disclosure— 
striking a balance.” 
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One issue is how closely the codes stay to describing explicit meaning, as opposed 
to interpreting implicit meaning. Most approaches tend to stay fairly close to the text 
at this point, attempting to condense what is expressed into a brief phrase or label 
(e. g., “a balancing act”), but the coding may involve some translation into psychological 
language. Other, more interpretive approaches, may focus on bringing out what is 
said implicitly “between the lines.” Some interpretive and critical approaches take this 
a step further and may read beyond the speaker’s apparent awareness, based on some 
interpretive theory (e.g., feminist theory or psychoanalytic theory). This approach is 
necessary when the analyst is dealing with apparent self-deception on the part of the 
respondent; however, it may not sit well with readers who do not share the analyst’s 
interpretive framework. 

Phenomenologically oriented researchers may take a stance of psychological reflection , 
which Wertz (1985) has described in terms similar to therapeutic empathy. In particular, 
he describes a process of “entering and dwelling,” in which the researcher attempts to 
immerse themselves in the participant’s world. The researcher tries to slow down the 
story and to dwell on its details and meanings, setting aside (bracketing) the assumption 
that they already understand what is being described. At the same time, the researcher 
tries to step back from the description, attending to meanings rather than matters of 
truth or falsity. Often, this process incorporates a dialogue with the data, in which the 
researcher asks such questions as, “What is really meant here?”, “What kind of thing is 
being described?”, and “How does this relate to the phenomenon I’m trying to under¬ 
stand?” This method is both descriptive and interpretive in that it tries to stay close to 
the data while still generating a deeper understanding through drawing out the 
participant’s implicit meanings and assumptions. Such approaches are not simply induc¬ 
tive but often involve a process of creative reading of the data that Rennie (2012) and 
others refer to as abduction (see Chapter 2). 

Although different approaches to qualitative analysis emphasize different ways of 
reading data, in actual practice there is much overlap. In carrying out a qualitative 
analysis, you may want to consider making flexible use of a range of such activities, as 
appropriate to your data and research questions. 

Categorizing 

All forms of qualitative analysis engage in some form of theme generation, in which the 
researcher groups together important concepts or ideas. The previous (“identifying 
meaning”) phase usually results in a tentative set of labels corresponding to the set of 
ideas in the data; the task now is to organize them conceptually. 

The process of giving initial labels leads to the identification of key concepts, often 
referred to as categories or themes (we will use the terms interchangeably). Such themes 
usually consist of a word or phrase that captures an essential meaning, often at a more 
abstract level (e.g., “normalization,” “unblaming the self,” “fall from grace”). The theme 
names may derive from numerous sources, including the respondent’s own words, the 
research literature, or metaphor. The development of themes is an interpretive process 
that involves both close attention to the meanings expressed in the data and the 
researcher’s own ideas. 

There is rarely one definitive set of themes: it is more often the case that there are sev¬ 
eral possible ways to categorize the data. Each researcher will have their own perspective, 
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partly depending on their epistemological leanings, but in practice there is often a 
common core of overlap between the themes produced by different analysts (Madill 
et al., 2000). What is important is that the analysis be transparendy grounded in the data. 

Sometimes, the data are first organized into large do mains corresponding to major 
aspects of the phenomenon being studied. These domains do not provide answers to 
the research questions, rather they are ways of dividing up the data in order to make 
it easier to analyze. They represent divisions that could have been made before the 
data were collected. Domains may be structured in different ways, for example, 
corresponding to topics addressed by the research questions, or following a broad 
narrative structure (e.g., “background,” “event,” and “aftermath”). 

The analysis begins at an individual level, taking each case sequentially. As further 
cases are added, the researcher attempts to identify patterns across the cases. 
The constant comparative method, originally articulated by Glaser and Strauss (1967), 
is valuable here. It essentially involves exploring similarities and differences between 
the emerging themes. As new themes are identified, they are compared to the other 
ones in the current set. If an idea is similar to an existing theme, it is added to that 
theme and may help to clarify or elaborate it. If it is judged to differ from existing 
themes, it is noted as a possible new theme. 

The process of categorization continues until saturation is reached; that is, until 
themes are no longer added or elaborated. As the analysis proceeds, the researcher will 
discover that fewer and fewer themes are added, and the analytic process becomes 
more and more one of coding data into existing themes. 

Integrating 

As themes begin to be identified in the data, the researcher attempts to make 
connections between them. The aim is usually to create some sort of conceptual 
framework, rather than a list of unrelated themes. Often a hierarchical structure can 
be identified, with lower order categories, often called subthemes, setting out the 
constituent parts of higher order themes. For example, in a study of the benefits of 
mutual support groups for parents of children with disabilities (Solomon, Pistrang, & 
Barker, 2001), three broad categories were identified: “Control and agency in the 
world,” “Sense of belonging to a community,” and “Self-change.” Each of these 
subsumed several lower order categories: for example, “Sense of belonging” included 
“Being understood,” “Sharing emotions,” and “Friendships.” In the grounded theory 
approach, researchers attempt to identify a single, higher order, unifying category— 
often referred to as a core category. It is intended to capture the essence of the 
phenomenon, the headline of the story. For example, in Solomon et al. (2001), the 
core category of “Identity change” captured the essence of the three broad categories 
listed above. 

The process of integrating is cyclical and interpretive, involving refining and linking 
concepts, going back to the original data to check for accuracy, and revising the 
categories and the framework. 

It is often helpful in the analytic process, and in the final presentation of the results, 
to depict the themes and their connections in a visual format. This takes different 
forms in different approaches. Grounded theorists, for example, often use tree 
diagrams that depict hierarchical category structures with the core category at the 
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top. Others may use a mind map or concept map style of diagram (Eppler, 2006); 
many others simply set out the thematic structure in a table. The aim is to provide the 
reader with a clear and coherent picture of the findings. 


Good Practice in Qualitative Analysis 

As described in Chapter 5, much has been written about ways of evaluating qualitative 
research. Several of these criteria pertain specifically to analysis. For example, it has 
been widely recommended that researchers implement procedures for checking the 
credibility of their analysis: these include using several analysts, having another 
researcher audit the data trail, and establishing testimonial (or respondent) validity by 
checking back with the original respondents or others like them (Elliott, Fischer, & 
Rennie, 1999; Mays & Pope, 2000; Stiles, 1999; Yardley, 2000). 

It is important for the researcher to consider whether the research questions 
have been thoroughly and clearly answered. What remaining ambiguities are there? 
How could the analysis have been continued? Is the analysis coherent and 
integrated, without oversimplifying the data? Does the analysis illuminate the 
phenomenon? These questions, which are also suggested in guidelines for evaluating 
qualitative research (see Chapter 5), need to be addressed by researchers throughout 
the analysis. 


QUANTITATIVE DATA ANALYSIS 


Steps in quantitative analysis: 

• data entry 

• data checking 

• data reduction 

• exploratory analyses 

• statistical significance testing for answering the research questions 

• analyzing the strength and clinical significance of effects. 


Before formal quantitative analyses or hypothesis testing can be carried out, researchers 
need to prepare their data and explore its general properties. 


Data Entry 

If the data were collected using a written questionnaire or quantitative interviews, 
they are usually entered manually into a spreadsheet, such as the SPSS data editor, or 
Microsoft Excel. If they were collected using a computerized interface, they can usu¬ 
ally be downloaded directly into Excel. Prior to data entry, the variables must all be 
named and defined, and any labels or codes for various values entered (e.g., the sex of 
the participant might be coded as 1 = male, 2 = female). 
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In the common case of a multiple-item scale, it is usually better to enter the scores on all 
the items, rather than just the total scale score, in order to reduce scoring errors and to 
allow for the possibility of analyzing the scale’s reliability or factor structure. Any reverse- 
scored items need to be recoded so that their values are consistent with the rest of the items 
in the scale. This can be done manually before the data are entered, but it is usually simpler 
and more reliable if the computer performs the recoding, creating new, reversed variables. 


Data Checking 

Data errors can arise either from typing mistakes at data entry or from incorrect 
computer commands. It is important to check for both possibilities. In order to ensure 
that the data have been entered correcdy, it is a good idea to proofread the entries by 
asking someone to read them aloud from a printout and checking them against the 
original source. Rosenthal (1978) estimated that, on average, about 1% of data points 
are wrongly entered. Sometimes computer scan sheets can be used in order to eliminate 
typing errors, but these also need to be checked to ensure that they have been properly 
filled in. Data entered via a computerized interface, such as with online questionnaires, 
avoids data entry errors by the researchers, but not, of course, by the participants. 

To check that the data are being processed correcdy, some simple descriptive analyses 
can be performed (Tabachnilc & Fidell, 2013). For nominal scale data, frequency analyses 
can be used; for interval scale data, summary descriptive statistics are also useful, including 
the mean, the standard deviation, minimum and maximum values, and the number of 
valid observations. These also provide some basic statistics that you will probably need 
for the Results section of your research report. For some descriptive studies, for example, 
opinion surveys or consumer satisfaction research, knowledge of the frequency 
distributions may be all that is required to answer the research questions. 

Descriptive analyses also help you to check that missing values are being handled 
properly, whether there is any systematic pattern to the missing data, and that there 
are no out-of-range values (e.g., “56” entered for a variable that is supposed to range 
from 1 to 7). Missing data can be estimated statistically (Schlomer, Bauman, & Card, 
2010), using a procedure such as multiple imputation, which is available in SPSS. 

Examining the frequency distributions of the variables also enables you to check 
their distributions, particularly whether or not they are approximately normally distrib¬ 
uted and whether there are any outlying observations that will distort the subsequent 
analyses. Discrepancies in standard deviations (i.e., ones much smaller or larger than 
other variables of the same type) often indicate problems with unreliability, outliers, or 
restricted ranges, which may suggest the elimination of cases, items, or measures before 
further analyses are carried out. If the data are not normally distributed, it may be 
possible to transform them to bring them closer to normality (Tabachnilc & Fidell, 
2013). Otherwise, nonparametric statistical tests may need to be used. 


Data Reduction 

Data reduction involves condensing the data, so that it is more manageable and easier 
to analyze. One obvious approach consists of simply dropping some of the variables 
from the data set. Researchers are often overambitious in the beginning stages of a 
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project and then realize at the start of the analysis that they have more variables than 
they know what to do with. Such planning errors can often be corrected by eliminating 
variables from the analyses. It is usually better to focus your energy on thoroughly 
analyzing a few important variables, rather than struggling to analyze everything that 
you optimistically included because you thought it might be interesting to look at. 

Once the basic variables have been decided upon, the data set can be reduced by 
summing or averaging the items of any multi-item scales to provide a total score or 
subscale scores (e.g., by using the SPSS Compute command). With a new scale, it is 
important first to conduct an item analysis (e.g., using the SPSS Reliability procedure), 
as the averaging process assumes that the items are parallel (see Chapters 4 and 6). 
Item analysis will identify bad items, that is, items which do not hang together with the 
rest of the scale, and will also show whether the scale as a whole has a high enough 
internal consistency to warrant its use as a homogeneous measure. Once these analyses 
are done, the individual item scores can be dropped from the data set or simply ignored. 

A third method of data reduction is factor analysis (Floyd & Widaman, 1995; 
Tabachnilc & Fidell, 2013), a multivariate statistical technique that is designed to 
determine the structure of a set of variables. It is often used as a step in measure 
development research (see Chapter 6), to investigate the number of underlying 
dimensions of a new measure or set of measures. Factor analysis can also be used for 
data reduction, when the researcher wants to represent most of the information in a 
large number of variables by a small number of independent factors. 

Item analyses tend to be regarded as preparatory analyses and are usually reported in 
the Method section of the research paper, whereas factor analyses tend to be regarded 
as proper analyses in their own right and are usually reported in the Results section. 


Data Exploration 

The final preparatory step is to get a feel for the patterns in your data. Even if you 
are working within a hypothesis-testing framework, it is still a good idea to look 
at the data from other angles to see what else they can teach you, if only to gen¬ 
erate ideas for future studies. Scientific advances often come from unexpected 
findings that purely confirmatory procedures may fail to pick up (Merbaum & 
Lowe, 1982). It is worth trying to develop a playful attitude to the analysis, 
looking at things from different angles, so that you end up feeling that you know 
the data inside out. 

Several statistical techniques have been developed to assist this process. Tulcey’s 
(1977) Exploratory Data Analysis is the standard reference volume; a briefer account 
is given by Velleman & Hoaglin (2012). Exploratory data analysis (usually abbrevi¬ 
ated to EDA) methods emphasize displaying the data graphically, and, in line with its 
spirit of taking a more playful stance towards one’s data, they often have appealing 
names, such as “stem and leaf plots” or “box and whisker plots.” Many can be done 
within SPSS. 

It is also useful to explore correlations between the main variables, particularly 
among all the independent variable measures and among all the dependent variable 
measures. Such analyses usually reveal patterns in the data that help you to understand 
subsequent results. For example, if one criterion measure performs differently from 
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the others, it is useful to have studied its patterns of correlations with the other 
variables. Similarly, repeated confirmation of hypotheses is less impressive if the 
variables in question are strongly interrelated, suggesting that they are really measuring 
the same underlying construct. 

There is a dilemma between, on the one hand, the desire to get the maximum 
mileage out of the data, by conducting many analyses, and, on the other hand, the 
need to avoid the common error of overanalyzing the data, of trying to relate 
everything to everything else (known as “fishing expeditions” or “data dredging”) 
and thereby capitalizing on chance associations, leading to high rates of spurious 
findings (Simmons, Nelson, & Sinonsohn, 2011). As we discussed above, you need 
to be ruthless in prioritizing the most important variables, and associations between 
them, that you want to focus on and then omitting the rest. 


Statistical Significance Testing for Answering the Research Questions 

For some discovery-oriented research, exploratory statistical analyses may be all that 
is required. Broad research questions are addressed with exploratory analyses, most of 
which may not be precisely planned in advance; instead, the researcher will follow up 
interesting leads as the analysis progresses. On the other hand, more focused research 
questions or hypotheses call for confirmatory analyses. These are aimed at testing 
prestated hypotheses, with specific planned tests corresponding to each one. In other 
words, exploratory data analysis is inductive (and abductive), whereas confirmatory 
analysis is deductive (Tulcey, 1977). 

In either case, you need to select statistical tests that are appropriate to your 
research questions and design. The complexity of the design and the nature of the 
research questions will determine the complexity of the analysis. For some designs, 
simple descriptive or correlational statistics will suffice; others may require complex 
multivariate methods. 

The choice of inferential statistical methods lies outside the scope of this text. 
Detailed treatments can be found in the standard statistics texts (e.g., Field, 2013; 
Howell, 2010; Siegel & Castellan, 1988; Tabachnik & Fidell, 2013; Winer et ah, 
1991). It is also worth seeking advice from psychologist colleagues or from statisticians: 
even experienced researchers need help for more complicated analyses (although 
statisticians prefer to be consulted before the data are collected, in order to have some 
input into the design). 


Analyzing the Strength and Significance of Quantitative Effects 

The final step in analyzing the findings of a study is to evaluate the strength and 
significance of the findings: are the results substantial or are they trivial? This might 
seem to be a matter of interpretation, but statistical methods for addressing these 
issues have continued to emerge, especially over the past 25 years. In addition to 
statistical significance (based on probability), these include effect size (based on 
amount of covariation), and clinical significance (based on practical impact). 

It is helpful first to return to Cook and Campbell’s (1979) four validity types (see 
Chapter 8). We have previously examined construct validity and internal validity. 
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The other two validity types are statistical conclusion validity, which we examine next, 
and external validity, which we touched on in Chapter 10 but will address further in 
the Interpretation section of this chapter. 

Statistical Conclusion Validity 

As we discussed above, statistical analysis will often demonstrate that two variables 
covary, that is, that they are associated with each other. For example, therapist empathy 
may be found to be associated with client improvement. The assessment of statistical 
conclusion validity asks whether conclusions about covariation are sound. It is a 
preliminary step before making causal inferences (which are covered under internal 
validity). Three questions need to be addressed (Shadish et ah, 2002). 

First, was the study sensitive enough to permit reasonable inferences about covariation? 
Greater sensitivity is obtained either by larger sample sizes or by reducing the amount 
of error, both of which give greater statistical power (see Chapter 10). Error can be 
reduced both by selecting measures that are more reliable and by choosing a research 
design that controls for extraneous variation (for example, by using a repeated-measures 
design), by selecting a homogeneous sample (see Chapter 10), or by incorporating an 
individual difference variable as an extra factor in an experimental design (see Chapter 8). 

Second, if the study was sensitive enough, do the variables in fact covary? Here the 
issue is whether the right statistical tests were performed. Did they meet the assump¬ 
tions behind them (e.g., for a normal distribution)? Was an appropriate error rate 
(alpha level, i.e., the critical value of p) set? How likely is it that the results were due 
to chance variations? “Fishing expeditions,” that is, conducting a lot of significance 
tests in a large data set until something interesting turns up, will produce spuriously 
significant results. For example, if you correlate 10 variables with 10 other variables, 
you will have 100 distinct correlation coefficients. Suppose that 10 of them are 
statistically significant at a conventional alpha level of p<0.05. In this case, your best 
guess is that half of these 10 correlations (i.e., 5 out of the 100) are significant by 
chance alone, but you will have a difficult time telling which correlations are probably 
genuine and which are most likely spurious. 

Of course, if you have to conduct multiple statistical tests, the alpha level at which 
the tests are performed should be made more stringent (Howell, 2010). Unfortunately, 
this will reduce your power to detect any real effects, sometimes drastically, because, 
in clinical research, samples are generally small and difficult to obtain. 

Third, if the variables do in fact covary, how meaningful is that covariation? 
This seemingly straightforward question opens up a number of difficult issues about 
how to measure the significance of the findings. There are three ways to do this, which 
we will examine in turn: statistical significance, effect sizes, and clinical significance. 

Statistical Significance 

Statistical significance defines the meaningfulness of an effect in terms of how 
improbable it is that it would occur by chance alone. However, psychologists have 
long argued that statistical significance in itself does not reveal much, and that the 
whole null hypothesis testing framework is mistaken (e.g., Balcan, 1966; Cohen, 
1990, 1994; Lyklcen, 1968; Rodgers, 2010). Cohen (1994) memorably entitled his 
paper “The earth is round (p<.05)”, which neatly sums up his main argument. 
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One issue is the arbitrariness of the conventional criterion of p<.05, in other words 
that a result has to have a probability of less than 1 in 20 of occurring by chance in 
order to be counted as being statistically significant. A result with a p of .049 is barely 
stronger than one with a p of .051, yet the first will be reported and the second will 
not. There is no logical reason why 1 in 20 was settled on, the convention could just 
as well have been 1 in 25 or 1 in 18 (although see Cowles & Davis, 1982, for the 
historical background to the .05 value). 

A more serious issue is that, given a large enough sample size, any effect will become 
statistically significant at whatever alpha level has been set, since no null hypothesis is 
ever exactly true (Meehl, 1978). Thus a result may be statistically significant but 
practically trivial. For example, in testing a new therapy for depression, a mean 
difference of two points on the Beck Depression Inventory between the experimental 
and control group may reach statistical significance with a large enough sample, but 
it would be clinically irrelevant, especially if both groups remained severely depressed. 

A third issue is whether it is ever valid to conclude that two groups are equivalent. 
This issue has implications tor hotly debated questions such as whether different ther¬ 
apies have equivalent outcomes. It is a truism in statistics that failing to show a 
difference between two groups does not mean that they are equivalent. In other 
words, “you can’t prove the null hypothesis.” However, various methods have recently 
emerged for testing the approximate equality of two conditions in a trial. These 
include equivalence analysis (Rogers et ah, 1993), noninferiority trials (Pococlc, 
2003), and Bayesian significance testing (Dienes, 2014). These approaches use 
confidence intervals to allow researchers to demonstrate the equality, within specified 
limits, of two or more experimental or control groups (see Elliott et al., 2013, for an 
application in psychotherapy research). 

A fourth issue is that the traditional null hypothesis testing framework is answering 
the wrong question. What we really want to know is: given the data, how likely is it 
that our hypothesis is true? However, null hypothesis testing addresses a different 
question: given the null hypothesis, how likely are our findings? Two statistical 
approaches provide researchers alternative ways of examining significance. The first 
is to use confidence intervals to give probabilistic parameter estimates (American 
Psychological Association, 2010b; Cumming & Finch, 2005); the second, more 
radical, alternative is to work within a Bayesian, rather than a frequentist, statistical 
framework (Dienes, 2011, 2014). 

Effect Sizes 

One potential solution to the problems of statistical significance is to also evaluate the 
meaningfulness of findings in terms of the amount of covariation, that is, the effect size 
(see our discussion of statistical power analysis in Chapter 10). There are a number of 
different effect size measures, depending on the statistical comparison being carried out 
(Cohen, 1992; Vacha-Haase & Thompson, 2004). The basic principle is to create an 
index of the strength of the relationship between two variables that is independent of 
the sample size. In general, effect sizes should always be reported along with statistical 
significance tests, as they provide important information for interpreting findings. 

The calculations involved are best illustrated by considering a simplified one-group 
pretest-posttest comparison. In this case, the most appropriate effect size measure is 
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the difference between the pre- and the post-therapy mean scores, divided by the 
pooled standard deviation for pre- and post-therapy. For example, in a study of virtual 
reality exposure therapy for fear of flying (Rothbaum, Hodges, Smith, Lee, & Price, 
2000), the mean pre-treatment score on the Fear of Flying Inventory was 105.85, the 
mean post-treatment score was 86.14 and the pooled standard deviation can be 
calculated as 36.65. The effect size therefore is (105.85 - 86.14)/36.65, which comes 
to 0.54 (Cohen, 1988, classifies 0.50 as a medium effect: see Chapter 10). Thus, 
using effect-size measures, we can say that these clients showed, on average, a 
moderate improvement in self-reported fear over the course of the therapy. 

Effect-size metrics have an important additional advantage: they make it possible to 
compare the strength of findings across different studies, using meta-analytic procedures 
(see Chapter 3). Their relevance to analyzing the strength of findings from a single 
study is twofold. First, it is often useful to use meta-analytic methods to compare and 
combine results within a study: for example, when you have several different outcome 
measures, effect sizes can be used to characterize the largest or smallest areas of 
change, and also to provide an overall summary index for the study (mean effect size). 
Second, effect sizes can be used to compare the results of a single study with the 
findings from other similar studies in the literature. This involves calculating the effect 
sizes and comparing them with the corresponding effect sizes obtained in other 
studies or from meta-analyses, thus revealing how the findings fit in with the rest of 
the literature. 

Clinical Significance 

One problem with effect sizes is that they compare differences in mean scores against 
the standard deviations of the groups, rather than against any absolute standard. 
Although effect sizes are more meaningful than ^-values, the presence of a large effect 
size still does not guarantee that a result is clinically meaningful. For example, in a 
two-group experimental design, a large effect could be due to small standard deviations 
in the experimental and control groups, rather than a substantial difference in the 
means themselves. A pre-post study of a psychological intervention could have a large 
effect size, but the clients may not feel much better after it. 

The search for a way of capturing which findings are clinically important and which 
are trivial has led to the development of indices of clinical significance, most closely 
associated with the work of Jacobson and his colleagues (e.g., Jacobson, Roberts, 
Burns, & McGlinchey, 1999; Jacobson & Truax, 1991). Such indices are now 
routinely incorporated into studies where clinical change is assessed. In contrast to 
both statistical significance and effect sizes, clinical significance methods examine 
change at the level of each individual, rather than averaging across groups of individ¬ 
uals. This enables one to address the commonsense question about a therapy—what 
proportion of its clients get better? 

These ideas are again best illustrated in the context of psychotherapy outcome research. 
The first thing is to ascertain whether there is reliable change, that is, whether the 
observed change is greater than the fluctuations that might be expected to arise from 
unreliability in the measuring instrument. Jacobson and Truax (1991) present a formula 
for a Reliable Change Index (RCI), which involves calculating the cut-off value that any 
pre-post change has to exceed in order for it to be considered as reflecting more than just 



Analysis, Interpretation, and Dissemination 231 

Dysfunctional Functional 



a. The area to the right of this line depicts the scores corresponding to criterion 1. 

b. The area to the right of this line depicts the scores corresponding to criterion 2. 

c. This line represents the mid-point between the means of the dysfunctional 
and the functional groups. 

Figure 12.1 Three criteria for clinical significance. From Jacobson & Truax (1991), 
© 1991 the American Psychological Association. Adapted by permission. 


random error of measurement (see http://www.psyctc.org/stats/rcsc.htm tor details on 
carrying out the calculations). However, reliable change is clearly just an initial require¬ 
ment: the more important issue is whether any change is clinically meaningful. 

The concept of clinical significance attempts to encapsulate quantitatively what 
we usually mean when we say that an intervention with an individual client was 
successful, which is that the client’s level of functioning (in terms, say, of a depression, 
anxiety, or self-esteem score) has substantially improved after the intervention 
(Kraemer et al., 2003; Lambert & Ogles, 2009). 

Substantial clinical improvement normally implies one of three things: (1) that the 
client is no longer in the abnormal range, (2) that they are back in the normal range 
(sometimes known as “high endstate functioning”), or (3) that they are at least half-way 
between the two. These different ways in which a successful outcome may be conceptualized 
can be expressed more formally (Jacobson & Truax, 1991; Lambert & Ogles, 2009): 

1. That the client’s post-intervention score no longer represents abnormal 
functioning, that is, it has moved outside the range of the dysfunctional 
population. Outside the range is usually defined as more than two standard 
deviations away from the mean of the dysfunctional population. 

2. That the client’s post-intervention score represents a return to normal functioning, 
that is, it has moved inside the range of the functional population. Inside the 
range is usually defined as being within two standard deviations of the mean of 
the functional population. 

3. That the client’s post-intervention score is more likely to be in the functional 
than the dysfunctional population. This is usually defined as being closer to the 
mean of the functional population than the dysfunctional population. 


These three criteria are illustrated in Figure 12.1 (adapted from Jacobson & Truax, 
1991), which locates the cut-off points for each of the three possible criteria on the 
distributions of the dysfunctional and functional groups. Which of the three criteria to 
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adopt in any given study depends on which of the three ways best fits your 
conceptualization of a significant outcome of the intervention that you are researching. 
Criterion (c) represents a possible compromise, since it is defined in terms of where the 
“weight of the evidence” falls. In measures like the CORE Outcome Measure (Barlcham 
et al., 2001), criterion (c) falls about 1.25 standard deviations above the mean for the 
normal population, where it provides a useful cut-off between “normal” and “moder¬ 
ately distressed.” 


INTERPRETATION 


Interpretation is the subject of the Discussion section of a research report 
and involves: 

• understanding the meaning of the findings, in terms of previous research 
and theory, and how they contribute to knowledge; 

• assessing the strengths and limitations of the study; 

• considering the scientific and practical implications of the findings. 


Analysis yields the basic findings of the study; interpretation attempts to spell out 
their implications or broader meanings. Within the quantitative tradition, analysis is 
often a technical exercise, which follows set rules and requires expertise more than inspi¬ 
ration, whereas interpretation requires imagination and insight into the psychological 
meaning of the phenomena. It is a broader conceptual task, aimed at bringing the 
results of the study to bear on the issues that initially inspired it. In qualitative research, 
the distinction between analysis and interpretation is not so clearly drawn; nevertheless, 
there is still room for taking a broader view of one’s findings or representations of 
the data. 

Interpretation is the main topic of the Discussion section of a research article; it consists 
of three related parts, which we will cover in the order that they usually occur in that sec¬ 
tion. The first is to understand the meaning of the findings, in terms of previous research 
and theory. The second is to assess the strengths and limitations of the study, to see whether 
it can really support the interpretations that you bring to it. The third is to address the 
scientific and practical implications of the findings: what research needs to be done next 
and how might the findings inform clinical practice and possibly also social policy. 


Contributions to Knowledge: Understanding the Meaning of the Findings 

Having examined the strength and significance of the findings, the next step is to 
understand what they mean. This task involves relating the findings back to the literature: 
the research, theory, or conceptual model that the study was based upon. How do 
the data answer the research questions? Do they support or contradict the underlying 
theoretical model? How do you explain any discrepancies between your expectations 
and the findings? What is the study’s contribution to knowledge: how does it change 
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our understanding, and what is new or important about it? (These are all questions that 
journal reviewers will ask if you submit your study for publication.) 

You may also wish to speculate more imaginatively about what the findings mean. 
Speculation is quite acceptable if labeled as such: that is, if you warn your readers that 
you are not claiming that your speculations are securely grounded in the evidence. 

“The facts are friendly” 

Although it is much easier said than done, it is worth reminding yourself to approach 
this task with an open mind. Try not to be defensive or dogmatic about your theories, 
but allow the data to speak for themselves. Carl Rogers used to say “the facts are 
always friendly” (Kirschenbaum, 1979: 205); in other words, do not fight the results, 
even if they cause you discomfort. The opposite of the attitude of openness is to deny 
the validity of the results if they conflict with your preconceived ideas: rather than 
adjusting your theories, you adjust reality instead. Research may force us to rethink 
our ideas, which can be a painful process, but if we are unwilling to revise our views, 
what is the point of doing the research in the first place? 

In practical terms, this attitude of openness to your data means that you should not 
give in to the temptation to skim through your results attending only to the findings 
that confirm your hypotheses. In fact, the results that deviate from your expectations 
are worth at least as much reflection and discussion as those that confirm them, 
because they are the key to revising your understandings of what you are studying, as 
well as improving your design, data collection, and analysis methods. 

Finally, researchers also need to consider the weaknesses of their chosen theoretical 
explanation. Could the findings be explained in other ways than in terms of your pet 
theory? Are they compatible with other frameworks than your own? How would a 
neuroscientist or a psychoanalyst view them? Often these questions lead on to devel¬ 
opments or refinements in the original theoretical model, which then lead on to ideas 
for further research (much like the cycle of inquiry that we described in Chapter 2). 


Methodological Issues: Strengths and Limitations of the Study 

All studies have their strengths and limitations. The task for researchers (and readers) 
is to weigh up the relative merits and flaws of any study. Some weaknesses are trivial 
and can be mentioned in passing; others will be more serious but may—or may not— 
be offset by particular strengths (Rozin, 2009). 

The seriousness of a study’s weaknesses determines how much credence can be given 
to its findings. Researchers need to ask whether there are any problems with their study 
that might have influenced the results: they owe it to themselves and their readers to 
make these explicit. (In qualitative research this process is sometimes called discount¬ 
ing.) The physicist Richard Feynman forcefully expressed his belief in this aspect of 
scientific honesty: 

It’s a kind of scientific integrity, a principle of scientific thought that corresponds to 
a kind of utter honesty—a kind of leaning over backwards. For example, if you’re 
doing an experiment, you should report everything that you think might make it 
invalid—not only what you think is right about it: other causes that could possibly 
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explain your results. Details that could throw doubts upon your interpretation must 
be given, if you know them. You must do the best you can—if you know anything at 
all wrong, or possibly wrong—to explain it. (Feynman, 1985: 341) 

It is useful at this point to return to the various methodological frameworks relevant 
to appraising particular types of research. For example. Cook and Campbell’s (1979) 
framework of validity threats in experimental research is useful not only for planning 
a study, but also for making sense of its findings and reflecting on its strengths and 
limitations. A review of the lists of validity threats is often useful for statistical 
conclusion validity issues, internal validity problems, construct validity confounds, 
and external validity limitations. Was the statistic power adequate? (A perennial issue.) 
Are there plausible third variables that might account for your findings equally well? 
In what other ways could your findings be explained, in addition to the variables that 
your research has focused on? How much conceptual overlap is there between 
predictor and criterion variables? In what ways do you think your sample might be 
unrepresentative of typical clinical populations, such as by being volunteers or early 
adopters of new treatments (see the following section on external validity)? 

Similarly, if your study was qualitative, it is useful to revisit the guidelines for good 
practice described in Chapter 5. For example: Have you described your commitments 
and expectations that may have affected the results? Was there enough data to reach 
saturation of categories? Did you use auditing procedures, member checks or multiple 
qualitative analysts? 

External Validity 

An important aspect of assessing the strengths and limitations of a study is to ask to 
what extent its findings can be generalized beyond its immediate context. This is the 
external validity question (Shadish et al., 2002; see also Chapter 10). It examines the 
representativeness of the study: its range of application across persons, settings, and 
times. Any peculiarities of the sample, procedures, setting, or timing will reduce the 
external validity. 

A dilemma for researchers is that the demands of external validity and those of 
statistical conclusion validity and internal validity often conflict. For example, one way 
to reduce error, and therefore to increase the statistical conclusion validity of the 
study, is to draw the sample from a homogeneous target population. However, this 
will make the sample less representative and thus lower the study’s external validity. 
Furthermore, randomized controlled trials are designed to have high internal validity, 
but this may be at the expense of having procedures unrepresentative of normal 
clinical practice (see Chapter 8). As frequently happens with decisions in research, 
there is no clear-cut answer here. 

The role of external validity in qualitative research is controversial (Braun & Clarke, 
2013). Some qualitative researchers reject the whole notion of representativeness, 
arguing that they are attempting to develop particular contextualized understandings 
of particular cases, and that generalizability is not an issue. Others argue that the 
representativeness of their sample (referred to as “horizontal generalization”) is not 
the issue, but what is important is whether the theoretical ideas that are generated are 
capable of broader application (“vertical generalization”). Yet others argue that 
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external validity is important in qualitative research, and can be obtained by careful 
sampling in order to draw a qualitatively representative sample, that is, a sample that 
includes the important variations and aspects of the phenomenon being studied. 
In any case, possible sample limitations need to be considered when drawing conclusions 
from a qualitative study, if that study claims to be interested in developing general 
knowledge about a phenomenon. 

Replication 

The best way to increase external validity is via replication. The more that you can 
reproduce the initial findings under diverse conditions, the more convincing will they 
become. Lykken (1968), drawing from Sidman (1960), distinguished three types of 
replication: (1) literal replication is an exact duplication of the study conducted by the 
original researchers using identical procedures; (2) operational replication is carried 
out by other researchers, using the methods published by the authors of the original 
study; and (3) constructive replication replicates the basic idea of the study, but uses 
different methods, for example, a different population or alternative measures of the 
same constructs. If the results of a study hold up under several constructive replica¬ 
tions, we can begin to develop an understanding of the range of situations and persons 
to which its results generalize. 

Research programs often begin with laboratory studies which have low external 
validity, since it may be a good idea to start out by a simple test of one’s theories in 
such a setting, rather than in an expensive and time-consuming field study. For example, 
early behavior therapy studies used college student volunteers who had spider phobias 
for which they had not sought help. However, if the first laboratory or analog studies 
prove to be successful, then the researcher needs to conduct more ecologically valid 
studies to give the findings credibility. On the other hand, a researcher with primarily 
applied interests might want to begin with small-scale field studies, thus avoiding the 
initial laboratory work. 

Psychology in general is currently undergoing a “replication crisis,” that is, a 
concern that many published findings are one-off instances, often caused by false 
positives, that will not be able to be replicated (Pashler & Wagenmalcers, 2012; 
Simmons et al., 2011). The problem partly stems from journal editorial policies of 
being reluctant to publish null findings (e.g., failures to replicate) and also prefer¬ 
ring to publish original research rather than replications. Added to this are recent 
high-profile cases of fraudulent publication in social psychology (see Pashler & 
Wagenmalcers, 2012). The Nobel Laureate, Daniel Kahneman, has said, with 
respect to research on social priming, that he fears a “train wreck looming” unless 
researchers get their collective house in order and start replicating each other’s 
work (see Yong, 2012). 


Scientific and Practical Implications 

Scientific Implications 

As we discussed in Chapters 1 and 2, research is often a circular process, in that the data 
may only partially answer the research questions. Often the study reveals, with the 
benefits of hindsight, that the research questions could have been better formulated, 
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that the theory on which it was based was inadequate, that there were measurement or 
design weaknesses, or perhaps that the approach shows promise and could be expanded 
or applied more broadly. All of these conclusions have scientific implications, in that 
they will lead naturally on to a plan for future research in the area. 

Practical Implications 

Finally, in clinical research it is important to consider what implications your findings 
have for professional practice. What would you suggest that practitioners (e.g., therapists, 
teachers) do, or not do, differently on the basis of your study? For example, the 
Dimidjian et al. (2006) RCT that we used as a running example in Chapter 8 has the 
implication that practitioners attempt behavioral activation with their severely 
depressed clients. 

Laying out the possible practical implications of your study often feels intimi¬ 
dating, as it requires you to extrapolate what would follow from your results if 
they held true for the clinical situations and populations that you were interested 
in when you started the research. A useful starting point is to ask yourself a 
question that examiners like to ask candidates at defense or viva exam meetings: 
How has your research changed your own practice? What if anything have you 
learned from your study that you have put to use in your work with clients? 
Sometimes this is a direct result of having your hypotheses confirmed or discon- 
firmed, but often it is more subtle learning, such as an increased awareness of 
certain client issues. Both obvious and more subtle learnings are worth noting as 
practical implications. 

Consideration of the practical implications of the study leads naturally on to the 
final section, dissemination, which considers how the findings and their implications 
will be made known to people who might use them. 


DISSEMINATION 


• Dissemination is the last step in the research process: it is often difficult but 
ultimately rewarding. 

• Research may be disseminated through a variety of outlets, from academic 
journals and conferences through to more popular media. 

• It is worth paying close attention to your writing, to tell your story as clearly 
and economically as possible. 

• Research does not always translate directly into wider use: several factors 
influence whether findings are acted upon. 


In the end, research is basically a public activity. In the midst of it, you may feel as though 
you are doing it for yourself alone, but ultimately your goal is to communicate your findings 
to others, often with the aim of achieving some kind of change. Usually this communica¬ 
tion involves a written report or published article, but it may also involve presentation in 
the research setting, at conferences, or for policy-makers in government departments. 
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Writing up 

Writing up the project can seem a mountainous task. It is easier if you start the process 
early on and think of the write-up as accumulating progressively by a series of successive 
approximations over the course of the project. Ideally, it is worth starting to plan the 
report and writing a first draft of the Introduction and Method sections while you are 
collecting your data. 

Having said that, many people resist writing up, for various reasons. For some, 
putting pen to paper, or rather fingers to keyboard, is the hardest step of all. It is 
arduous and intellectually demanding, since it forces you to present your ideas in 
a clear and watertight way. Fear of criticism—one’s own or others’—can lead to 
procrastination. Also, the workload and the emotional stresses of many clinical 
jobs can make it hard to find the time for writing. 

Good research reports are usually simple. They tell the story of the research 
project, sticking to the main themes without bogging the reader down in irrelevant 
detail. It is worth continually bearing in mind the one or two key questions that 
guided the investigation (or at least the part of it that you are writing up) and to 
structure your report around those themes. This is often hard to do at the end of 
a study, because you have often lost perspective on what is important, and cannot 
see the wood for the trees. Try to step back and distance yourself from your study 
(this is not easy if you have just spent months fretting over it), attempting to see it 
through the eyes of a general informed reader, as though it were done by someone 
else. Get criticism from trusted colleagues. Presentations at seminars or confer¬ 
ences are often a good way of shaping up your work and getting other people’s 
reactions to it. 

Writing Style 

Not only should you try to tell a simple and clear story, but also try to tell it in simple 
and clear prose. The novelist, Philip Pullman, expresses this nicely: 

The aim must always be clarity. It’s tempting to feel that if a passage of writing is 
obscure, it must be very deep. But if the water is murky, the bottom might be only an 
inch below the surface—you just can’t tell. It’s much better to write in such a way that 
the readers can see all the way down; but that’s not the end of it, because you then 
have to provide interesting things down there for them to look at. (Pullman, 2002) 

Journal articles in psychology, and in the social sciences generally, are notorious for 
using incomprehensible jargon or overelaborate sentence constructions. Much 
psychology writing is impenetrable or pretentious. Oppenheimer (2005) parodied 
this tendency in the title of his paper “Consequences of erudite vernacular utilized 
irrespective of necessity: problems with using long words needlessly” and his findings 
showed that the strategy of using long words backfired, in that writers using complex 
expressions were judged to be less intelligent. 

George Orwell’s much quoted spoof rewrite of a Biblical passage (see box) is a 
paradigmatic example of the contrast between vigorous writing and psychological 
waffle. Many style guides are available to help combat the tendency to write like 
Orwell’s second paragraph. We recommend Lanham’s (2007) amusing and 
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idiosyncratic Revising Prose, which presents a 10-step method for editing one’s own 
writing, Strunk and White’s (2000) miniature gem The Elements of Style, and Williams’s 
(2007) Style: Lessons in Clarity and Grace. 


George Orwell’s parody of psychological writing. 

Here is a well-known verse from Ecclesiastes: 

I returned and saw under the sun, that the race is not to the swift, nor the 
battle to the strong, neither yet bread to the wise, nor yet riches to men of 
understanding, nor yet favor to men of skill; but time and chance happeneth 
to them all. 

Here it is in modern English: 

Objective consideration of contemporary phenomena compels the conclusion 
that success or failure in competitive activities exhibits no tendency to be com¬ 
mensurate with innate capacity, but that a considerable element of the unpre¬ 
dictable must be taken into account. 

Orwell (1946/1968: p. 156); reproduced by permission of die estate of the late S. M. B. Orwell.) 


Psychology journals have complicated stylistic requirements of their own, which 
can deter novice authors. The APA Publication Manual (American Psychological 
Association, 2010b) is comprehensive and detailed, ranging from general issues 
about layout and style to minutiae about where to put commas in the reference 
list. It also includes a helpful section on how to use inclusive language that does 
not express prejudice about gender, ethnicity, etc. Sternberg and Sternberg (2010) 
summarize the stylistic guidelines and provide useful advice on how to write up 
a project. 


Publication 

It is always worth considering publishing your research, even though your primary 
aim may be to write it up for a course requirement or part of a local evaluation. 
The initial research report itself may not be very useful. Dissertations and theses are 
often long, formalized, and indigestible, and evaluation reports are usually geared to 
the interests of a local audience. On the other hand, research reports in professional 
journals at least aim for brevity and comprehensibility. If the study does not meet the 
exacting methodological standards of the APA or BPS flagship journals, consider less 
demanding outlets, often those attached to specific divisions, sections, or interest 
groups of a professional body. You may also want to present your work at conferences, 
which is often a good stepping stone to publication. 
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The process of submitting an article for publication is as follows: 

1. Identify the target journal that you are aiming for, as different journals have different 
requirements, both in terms of content and style. The anticipated readership of the 
journal will pardy determine what material to include and how to present it. 

2. The journal website will describe the topics and kinds of articles that the journal 
publishes, and give a link to tables of content of recent issues. It will also have a 
link labeled something like “instructions for authors,” giving the journal’s stylistic 
rules and instructions for submitting manuscripts. 

3. Once submitted, if your paper does not meet the broad requirements of the 
journal or it is considered to be seriously flawed, the editor will send it straight 
back without additional review. Otherwise, it will be sent out to reviewers, who are 
usually blind to the authors’ identity. Reviewers are normally asked to return their 
reviews to the editor within a month, but they often take longer. On the basis of 
the reviews, the editor then makes one of the following decisions: accept the paper 
immediately (this is very rare), accept it subject to specified amendments, request 
revisions and resubmission, or reject it. Mainstream journals tend to have high 
rejection rates: for example, in 2013, the Journal of Consulting and Clinical 
Psychology had a rejection rate of 79% (American Psychological Association, 2014). 

4. The editor will email their decision to you with copies of the reviewers’ com¬ 
ments. You then have to decide how to proceed, based on your perception of 
the reviewers’ and editor’s views of your work. Bad reviews, both in the sense 
of negative ones or sloppy ones, can be upsetting. The whole process may be 
somewhat arbitrary, in that different reviewers do not always agree (Fislce & 
Fogg, 1990). Even if you get unfavorable reviews, it is important not to give 
up. A thick skin is needed: if your paper is rejected, it is better not to take it too 
personally, and to move on to resubmitting it elsewhere. At least try a couple of 
other journals before concluding that your work is not worth publishing. 
The potential pleasure and professional recognition that comes from getting 
your work in print repays some investment of effort and emotional energy. 


Authorship Issues 

If the research has been done as part of a team (or if your supervisor has had a major 
input), the issue arises of who will be listed as authors and in what order. 

Authorship issues can often arouse feelings of competition or resentment in the 
research team and so it is helpful to start discussing them early on in a research project 
(see Chapter 3). However, sometimes everyone’s contributions can only be evaluated 
once the study is completed. 

To be listed as an author, an investigator should have made a substantial scientific 
contribution to the paper, for example, a major contribution to the formulation or 
design (American Psychological Association, 2002, Standard 8.12; British Psychological 
Society, 2011; Fine &Kurdek, 1993). Minor contributions that do not merit authorship, 
for example, help with interviewing or data analysis, or a senior doctor’s permission to 
study patients under his or her care, should be mentioned in the acknowledgements 
section. The order of authorship should reflect each person’s contribution. 
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Utilization 

Ideally, research should serve a purpose, not just be an empty exercise. You can increase 
the likelihood of its having an impact if you actively promote your findings. Articles 
in academic journals are only read by a select few, so it is worth putting effort into dis¬ 
seminating your work more widely. This can be done informally, by discussing the 
research with people who might use it to make decisions (e.g., managers, policy-makers, 
government officials). Or it could be done by writing more accessible articles for the 
general public, for example, in newspapers, magazines, blogs or specialist websites. 
You may even want to write a press release in order to interest radio, television, or print 
journalists in your findings. Many institutions have a press or media office to assist 
researchers with dissemination. Email listserves and social media can also be used to 
alert people to your study. 

The process of how research findings get taken up, if at all, is not always clear. 
Academic research can feel narrow and inward looking, with researchers writing 
primarily for other researchers, but sometimes findings do permeate through to 
influence practice. Weiss (1972, 1986) has developed models of research utiliza¬ 
tion, the thrust of her work being that the relationship between research and 
policy is nonlinear and complex (for further discussion, see also Patton, 2008; 
Humphreys, 2003; Shadish et al., 1991). The diffusion of innovation model 
(Rogers, 2003) examines how innovations in general, such as technologies or public 
health programs, are taken up (or not) over time. The naive idea that research 
has a direct influence upon policy is rarely borne out in practice. Often research is 
ignored, misapplied, or used to buttress only one side of an argument. There is now 
an academic specialty, with its own journal Implementation Science , devoted to 
issues of translating research into policy or practice (Madon, Hofman, Kupfer, & 
Glass, 2007). 

However, there are cases where, lor better or worse, research findings appear to 
strike a chord and become incorporated into far-reaching policy decisions. The studies 
carried out in the 1950s and 1960s, demonstrating the dehumanizing effects of long-stay 
institutions, provide an example of research influencing practice. Although this 
research may have been used simplistically, and possibly only as a cover for cost cutting 
by governments, it also made forceful points that have lasting resonance about the 
psychological damage done by institutionalization. 


THE END 

By the time you have finished writing up your study—particularly if it is a student 
project—you may be thoroughly fed up with it and painfully aware of its flaws. You 
may even be thinking of giving up research altogether. Although we understand that 
reaction, having felt it many times ourselves, we hope that you will give yourself a 
well-earned break and that after you have recovered you will return to do more 
research. Consider going back to the drawing board and using your hard-won wisdom 
to start the cycle again by designing a better study. The field of clinical psychology 
needs to strengthen its knowledge base through high-quality research. Psychologists 
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who can draw on both research and clinical skills—whom we would call good scientist- 
practitioners—are central to this process. 


CHAPTER SUMMARY 

The final stage of the research process consists of three interrelated parts: analysis, 
interpretation, and dissemination. Analysis means establishing what the findings are, 
and in particular how the data answer the research questions. 

Qualitative analysis is a largely inductive procedure. Although different 
qualitative orientations use different approaches to analysis, some general princi¬ 
ples can be outlined. Analysis starts with preparing and organizing the data (usu¬ 
ally involving transcribing interviews). The analysis itself involves three related 
processes: identifying meanings in the data, grouping ideas into categories or 
themes, and integrating the themes into a conceptual framework. Qualitative anal¬ 
ysis can be done using either a within-case (idiographic) or a cross-case approach. 

Quantitative analysis may be either exploratory or confirmatory, depending on 
the type of research questions. It involves preparing the data, exploring its prop¬ 
erties, conducting statistical tests to address the research questions, and evaluating 
statistical significance. The concepts of statistical conclusion validity, effect size, and 
clinical significance are typically used to evaluate the strength and meaningfulness of 
the findings. 

Interpretation and dissemination are the final stages of the research process. First, 
the researcher needs to evaluate the meaning of the results, including their implica¬ 
tions for scientific knowledge, their methodological strengths and shortcomings, and 
their practical, professional, and policy implications. Dissemination involves commu¬ 
nicating both the findings and your understanding of them to other people. It is often 
a difficult but rewarding task, as it involves painstaking attention to the detail of what 
you are trying to communicate. Research may be disseminated through a variety of 
outlets, from academic journals and conferences through to popular media or social 
media. It does not always translate directly into practice: a complex set of factors 
influences whether research is ever acted upon. 


FURTHER READING 

Robson (2014) covers the fundamental steps in both quantitative and qualitative 
analysis, and also discusses the dissemination of research. Statistical methods are 
explained in the standard texts, for example, Field (2013) and Howell (2010) for a 
general coverage, Siegel and Castellan (1988) for non-parametric methods, Winer et 
al. (1991) for analysis of variance, and Tabachnik and Fidell (2013) for multivariate 
analysis (and also a good account of the preliminary steps in data preparation). 

There are a number of sources for qualitative analysis. Some present general 
approaches (e.g., Miles et al., 2014; Patton, 2002) or give an overview of a range of 
approaches (e.g., Camic et al., 2003; Creswell, 2013; Willig, 2013); others present 
one specific approach in detail. Among the latter are Braun and Clarke (2006, 2013) 
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on thematic analysis, Corbin and Strauss (2015) on grounded theory, Smith et al. 
(2009) on interpretative phenomenological analysis, Hill et al. (2011) on consensual 
qualitative research, and Potter and Wetherell (1987) and Potter (2012) on 
discourse analysis. 

Shadish et al. (2002) are, as always, a key source in thinking generally about 
interpreting findings. It is worth reading the first few pages of Jacobson and Truax’s 
(1991) classic paper on clinical significance, and there is a further discussion of effect 
size and clinical significance in Kraemer et al. (2003) and Lambert and Ogles (2009). 
Finally, Carol Weiss’s (1972, 1986) seminal work on the uptake of research by policy 
makers is always worth revisiting. 


QUESTIONS FOR REFLECTION 

1. To get a feel for qualitative analysis, make an attempt at analyzing the main 
themes in a short extract from an interview, either one you or a colleague have 
conducted, or you could use the brief extract given in Chapter 6. 

2. What do you think about the practice of estimating missing data? Is it cheating? 
When do you think it would be justified? When might it be a bad idea? 

3. In your own research, what would each index of significance (statistical 
significance, effect sizes, and clinical signficance) tell you? 

4. It’s not too early: Spend a few minutes contemplating the possible real-world 
applications or implications that your results might have: (a) if you find what you 
expect/hypothesize; or (b) if you fail to find what you expect/hypothesize. Write 
these down now rather waiting until you’re too exhausted in the last stage of 
writing up. 

5. Surveys show that most psychologists don’t publish their thesis or dissertation. 
Why do you think this is? 

6. Writing up research can be difficult. At a personal level, what beliefs or feelings 
get in your way? How can you minimize their impact? 
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“Thus scientific methodology is seen for what it truly is—a way of preventing 
me from deceiving myself in regard to my creatively formed hunches which 
have developed out of the relationship between me and my material.” (Rogers, 
1955, p.275) 

We started this book by comparing a research project to telling a story. Our own narra¬ 
tive of how a research project progresses is now drawing to a close. The book has been 
structured around the four basic stages of the research process, that is, the steps that 
researchers go through when they are carrying out a project (although they do not usu¬ 
ally go through them in the neady ordered sequence that we have depicted). These 
stages are: (1) groundwork, (2) measurement, (3) design, and (4) analysis, interpreta¬ 
tion, and dissemination. Separating out the important issues according to the stage in 
which they are prominent is helpful both in planning and also in reading research. 

This brief final chapter brings together some central ideas that run through the book: 
methodological pluralism (matching the method to the problem), appraising research, 
and combining research with practice. Finally, we end with some images of research. 

Methodological Pluralism 

Our central theme has been methodological pluralism: that no single approach to 
research is best overall, rather, what is important is that the methods be appropriate 
for the questions under investigation. It can also be labeled appropriate methodology, 
by analogy with the catchphrase “appropriate technology” (although stricdy speaking 
the word methodology should only be used in its precise meaning of the study of 
methods). No single research method is inherently superior to any other: all methods 
have their relative advantages and disadvantages. 


Research Methods in Clinical Psychology: An Introduction for Students and Practitioners , 
Third Edition. Chris Barker, Nancy Pistrang, and Robert Elliott. 

©2016 John Wiley & Sons, Ltd. Published 2016 by John Wiley & Sons, Ltd. 
Companion Website: www.wiley.com/go/barker 



244 


Epilogue 


However, we want to make it clear that methodological pluralism is not equivalent to 
methodological anarchism. Unlike Feyerabend (1975: see Chapter 2), we are not saying 
that “anything goes.” Quite the reverse, we have tried to outline methodological rules 
and principles within the context of each method. Some of these principles are common 
to all approaches; some are relevant only to specific research approaches or genres. 

Wherever they are applied, the central purpose of these methodological rules is, as 
Carl Rogers aptly said in the quotation forming the epigraph to this chapter, to prevent 
one’s deceiving oneself and others by drawing conclusions that are not supported by 
the data. In terms of the simple model of research that we presented in Chapter 2, 
the essence of the research attitude is finding ways to test your ideas against your 
experience of the world. 

This book has covered both traditional, quantitative methods and the more recent 
(at least within psychology), qualitative methods. Although the proponents of each 
approach have sometimes represented warring factions, we believe that the debate has 
often been too polarized and that it is possible, indeed desirable, to combine multiple 
methods within a single study or research program. 

Our message is not that knowing about research methods enables you to produce 
the perfect piece of research. However, we do hope that, having read this book, you 
will be better able to make informed choices in your own research—or at least 
informed compromises. As we have said throughout, there are always compromises 
and trade-offs in research. However, although we are saying that there is no one right 
way to do it, there definitely are wrong ways. Consideration of how research might be 
done badly leads in to the next section, on appraising research. 


Appraising Research 

We have aimed to give readers the conceptual tools needed to become both better 
producers and better consumers of research. Throughout the book we have pointed 
to issues that need to be considered in evaluating studies, whether from the standpoint 
of the researcher or the reader. Here, we will focus on the consumer’s perspective, and 
address how the concepts that we have raised in the context of planning and executing 
research can be applied when you are reading and evaluating research reports. 

The more you know about research methods, the more you are able to recognize 
the problems in a piece of research. However, this does not mean that appraisal equals 
negative criticism. It is easy to criticize research. Psychology training often does a 
good job of teaching students how to pull studies apart; it is usually less good at 
giving them a sense of perspective. Our own students are often quick to find numerous 
flaws (many of which we consider to be trivial) in research papers, but they are less 
able to take a broader view and to see things in balance. 

Thinking about the different stages of the research process helps you to conceptu¬ 
alize some of the important issues to consider when evaluating a piece of research. We 
have discussed general criteria that apply to all types of research as well as some that 
apply to specific approaches. We present these general criteria below, organized 
roughly according to how they fit within each of the stages in the research process. 
Although we have tried to make them as generally applicable as possible, not all of the 
criteria will apply to all pieces of work, nor will each criterion have equal weight. 
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Criteria for evaluating research 

Groundwork. The first question to ask is “Who cares?” In other words, is the 
topic worth researching, has it been done before, does the study have the 
potential to add useful knowledge or to develop theory? 

Is the literature review relevant and up to date? Does it cover empirical, meth¬ 
odological, and theoretical issues, and does it place the study in the context of 
scientific research in the area? Is the rationale for the study clearly articulated 
(possibly including an indication of the conceptual model linking the variables 
under investigation)? Are appropriate research questions or hypotheses clearly 
formulated? 

Measurement. How are the main constructs defined and measured? Do the 
measurement methods (used in a broad sense to cover both quantitative and 
qualitative methods) adequately capture the constructs of interest? Are the 
methods appropriate to the research questions? If quantitative measures are 
used, do they have acceptable reliability and validity? If qualitative methods 
are used, have they been placed in the context of the researcher’s theoretical 
orientation? 

Design. Are the procedures described in sufficient detail to enable the 
reader to understand what was done and, if necessary, to be able to repli¬ 
cate the study? Is the design of the study appropriate to the research ques¬ 
tions? Will it enable the desired inferences to be drawn? In quantitative 
studies, what are the threats to internal, external, construct, and statistical 
conclusion validity? 

Is the sampling procedure clearly specified? Are the size and composition of 
the sample appropriate to the research questions and data analysis? Does the 
study conform to the relevant professional and ethical standards? 

Analysis. Do the analyses address each of the research questions? Are the data 
presented clearly and coherently? (Any tables and figures should be both 
understandable and informative.) In quantitative studies, are data reduction 
techniques and statistical tests used correctly? In qualitative studies, is the 
analysis grounded in examples, and have procedures been included to check 
the credibility of the analysis? 

Interpretation. Are the findings interpreted in the context of the research 
questions and the wider theoretical context in which the work was carried 
out? Are interpretations supported by the data? (However, speculations are in 
order if labeled as such.) Are competing explanations for the findings consid¬ 
ered? Is the generalizability of the findings assessed? Are weaknesses of the 
study addressed? Are the scientific and practical implications of the findings 
discussed? 

Presentation. Is the paper readable? (Consider its general prose style, use 
of jargon and sexist or other offensive language.) Is the paper’s length 
appropriate for its content? 
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In arriving at an overall evaluation of a research paper, take into account that 
strengths in some areas may compensate for weaknesses in others (Rozin, 2009). 
Rather than simply listing the flaws of a study, try to estimate how they distort 
the conclusions of the research. Some technical flaws in the procedure may have a 
negligible impact on the results. The more important or innovative the topic or 
methods, the more forgivable should be any shortcomings: it is relatively easy to do 
methodologically sound but trivial research; it is harder to do innovative research that 
makes a scientific or practical impact. 


Combining Research with Practice 

As we hope to have made clear throughout this book, we believe there are benefits to 
be gained from the scientist-practitioner approach in its various expressions. Research 
helps advance practice by developing and testing new procedures; practice helps 
advance research by providing a source of, and a testing ground for, new ideas and 
methods, and by giving a reminder of the complexity of human behavior that helps 
counterbalance the simplifying tendency of much research. Researchers and practi¬ 
tioners have not always seen eye to eye: they have different needs and live in different 
worlds—even when they are the same person. Despite these difficulties, we believe 
that the relationship between the two activities can ultimately be mutually enriching. 

However, although a scientist-practitioner approach is good for the field as a whole, 
it does not follow that combining the two activities is right for everyone. We recognize 
that actually carrying out research may not be everyone’s cup of tea. As we discussed 
in Chapter 2, although there are many positive reasons for becoming involved in 
research, there are also several reasons why combining research and practice is prob¬ 
lematic. Different individuals will weigh up each of these reasons differendy, and 
decide to what extent, if at all, they want to be involved in conducting research. We 
do maintain, however, that at a minimum, practitioners need to be sufficiently 
informed about research methods to be able to understand and appraise research, even 
if they are not actually doing it themselves. 

In the past, the scientist-practitioner role has been identified with a narrow concep¬ 
tion of research, which has put many psychologists off attempting to do their own 
research. It is certainly true that some types of research are prohibitively complicated, 
costly, and time-consuming for the individual practitioner: randomized comparative 
therapy outcome research being the prime example. It is also unrealistic to expect that 
most practitioners would have the resources to conduct the kind of research that 
meets the exacting requirements of the major scientific journals. However, it is pos¬ 
sible to work within a broader conceptualization of the scientist-practitioner model 
and generate practice-based evidence. We have tried to outline some possible methods 
that can be adopted by practitioners working on their own (especially the small-N 
approaches outlined in Chapter 9 and the evaluation methods in Chapter 11). 

Another strategy to increase one’s involvement in research is to make it a group 
endeavor. For us, one of the central pleasures of research is the process of working 
with students and colleagues. Teamwork provides stimulation through discussing 
mutually interesting ideas and struggling to resolve disagreements or differences in 
perspectives. It also brings support to what can otherwise be a lonely enterprise. 
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We would strongly echo Hodgson and Rollniclc’s (1996) advice to form a research 
team if at all possible. 

We hope to have conveyed the message that research need not be as forbidding a 
process as is often imagined. Some of the old rigidities are now dissolving; it is widely 
recognized that there are many different approaches to research. It is possible to 
work in a genre of research that suits your own values, abilities, and ways of thinking. 
However, research is not to be undertaken lighdy. It requires time and effort and it 
can be an intellectual struggle: rigorous thinking does not usually come easily. But 
we hope that ultimately the potential enjoyment and stimulation that research gives 
will encourage at least some of our readers to consider becoming involved in it 
themselves. 


Some Images of Research 

We began this book with the metaphor of research as a story. We will end with three 
additional images of the research process. 

First, research can be understood as a need^ a modern expression of the innate, 
universal human need to understand and master one’s environment. This is how 
Cook and Campbell (1979) see research, with the framework of “evolutionary episte¬ 
mology,” that is, as part of what has made it possible for human beings to survive and 
prosper. This may be seen as an aspect of the growth tendency (Rogers, 1961), or as 
a biologically adaptive, inherited trait of curiosity, or simply as the “joy of knowing.” 
In any case, research is one of the primary contemporary vehicles for living out this 
basic part of what it means to be human. We are not saying that research is always fun 
(much of it is drudgery!), but only that it is marked by moments of understanding 
and accomplishment which make it all worthwhile. 

Second, research is a journey , a metaphorical going out on a voyage of exploration, 
like Odysseus or Jason in Greek mythology, or Darwin’s voyage of discovery on the 
Beagle. This voyage begins with optimism and excitement but often veers toward 
danger, risk, and disappointment, sometimes almost crashing on the rocks of rigid 
methodology or running the danger of being sucked down into a whirlpool of con¬ 
fusing alternatives (Kvale, 1996). Sometimes, we make it home with a research project, 
and sometimes it fails and is never heard from again, buried at the back of a filing 
cabinet. But generally, if we persist, we will return safely with a story or two to tell 
about the adventure. 

Third, research is a cultural tradition , a means of developing and communicating 
important ideas between people and across time. The methods and concepts we use 
are part of the syntax and vocabulary of this tradition; we use them as a modern rhet¬ 
oric (Rennie, 2012), to establish ourselves and our work as credible and persuasive. 
But this also means that research is not complete until it is communicated to others. 
Viewing research as a cultural tradition also suggests that we can expect our field’s 
research methods to continue to evolve and change, and to produce new forms and 
contents. This is simply what it means for something to be a living tradition: if it were 
set in stone, it would be dead. 

Finally, being part of a tradition means that as you proceed, you first borrow (with 
credit) some of the tradition’s voices; then you master the techniques, which means 
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that you make some of these voices your own; and then you allow the tradition to 
speak through you. In the end, perhaps the best goal for a researcher is this: to become 
part of the tradition of knowledge and method in your field, adding your voice to 
those who have gone before, speaking to and through those who come after you, 
even when they no longer even recognize that it is your voice, among many others, 
that speaks in them. In this book, we have tried to provide a basic foundation in the 
tradition of clinical research, and we have invited you to step inside and find your own 
voice within it. 


QUESTIONS FOR REFLECTION 

1. Throughout this book, we have advocated methodological pluralism. Do you 
think that it is possible to carry this too far, or that such an approach might ask 
too much of fledgling researchers? What reasonable alternatives to such an 
approach might there be? 

2. Find a study that is close to your topic of interest and apply the list of “Criteria 
for Evaluating Research” to it. How easy or difficult was it to do that? What did 
you learn? 

3. Our sense of what scientific research means to us can change over time and is very 
individual. This chapter contains several common metaphors or images of the 
research process, but there are many more, such as mining/archeology, tree/ 
plant, pushing a boulder uphill, and so on. What metaphors or images for the 
research process fit your experience? 

4. Having finished reading this book (or just this chapter), what questions are you 
left with? What have we left out or not given enough attention to? What else 
would you have liked to have heard about? (Please consider emailing us to let us 
know your thoughts.) 
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